• Open

    [D] Any tips for handling sparse, discrete, temporal data?
    Hi, I'm wondering if anyone could help familiarize me with modern feature engineering methods for handling this type of data in the context of machine learning. Background The data I'm talking about is hospital/facility utilization data for patients. Below is a table that is similar to the data I have for each patient. In this example we can see that this single patient went to the ED on the 5th day, was admitted into the hospital on the 5th day, and was admitted to a SNF (Skilled Nursing Facility -basically a nursing home) on the 8th day: ED Hospital SNF Day 1 0 0 0 Day 2 0 0 0 Day 3 0 0 0 Day 4 0 0 0 Day 5 1 1 0 Day 6 0 0 0 Day 7 0 0 0 Day 8 0 0 1 Day 9 0 0 0 In reality, we have more types of facilities as columns and years worth of days. Problem…  ( 2 min )
    [P] Deploying my MIDI generator
    Not sure if this is the best place for this, as I am not a Machine Learning engineer. I have developed a web application that lets you generate and remix MIDI with NLP Transformers. Right now it runs on Flask. The model is fairly small as far as transformers go (500Mb) and doesn't use very much memory when running. It is only doing inference and so it runs for about 5-15s while processing a job, containerized it uses less than 4Gb memory at all times. I am trying to figure out how to deploy my Flask server to EC2 without incurring massive compute costs. It seems like all of the instances with GPU or Spot GPU support are prohibitively expensive and provide way more memory and local storage than I would need. Have any of you encountered a similar problem? Are there any instances that you suggest? Project here: https://github.com/pickles976/chiptune-ai submitted by /u/EuphoricFedoria [link] [comments]  ( 1 min )
    [D] Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    Diffusion models (DMs) have a more stable training phase than GANs and less parameters than autoregressive models, yet they are just really resource intensive. The most powerful DMs require up to a 1000 V100 days to train (that’s a lot of $$$ for compute) and about a day per 1000 inference samples. The authors of Latent Diffusion Models (LDMs) pinpoint this problem to the high dimensionality of the pixel space, in which the diffusion process occurs and propose to perform it in a more compact latent space instead. In short, they achieve this feat by pertaining an autoencoder model that learns an efficient compact latent space that is perceptually equivalent to the pixel space. A DM sandwiched between the convolutional encoder-decoder is then trained inside the latent space in a more computationally-efficient way. In other words, this is a VQGAN with a DM instead of a transformer (and without a discriminator). As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/293 Blog post: https://www.casualganpapers.com/high-res-faster-diffusion-democratizing-diffusion/Latent-Disffusion-Models-explained.html Latent Diffusion Models arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
    [D] Training set size of SOTA models?
    Hi, I'm trying to get an idea of just how large datasets for big ML models, such Dall-e are. Does anyone have some references? GPT-3 is apparently trained on 45TB of data: [https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai](https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai) ​ How large are other big datasets? submitted by /u/rk3000 [link] [comments]
    [D] How to train large CLIP-style models?
    I'd like to train a medium-large contrastive model similar to CLIP. Something that will fit onto a single GPU but for which I can only compute a few batch elements at a time. The issue is that these types of models rely on other batch elements for the "contrastive" part of the equation. Bigger batch sizes lead to faster (and better) convergence. Is anyone aware of any papers (or have any hands-on tips) that address this issue? submitted by /u/neonbjb [link] [comments]  ( 1 min )
    [D] Focus on the Process: Formulating AI Ethics Principles More Responsibly
    Hi there, there is a new Gradient article some of you may find interesting: Focus on the Process: Formulating AI Ethics Principles More Responsibly Here's a preview of what it's about: It is tempting to respond to the present state in AI ethics by abandoning searches for principles. Given that there are so many principles out there already and so few tools to operationalize them, organizations might be inclined to simply use some of the existing principles and focus their attention on operationalization. However, a difficult question to answer is which principles to use. How do we know that organizations will choose well? What is to prevent them from cherry picking, for example? One way to go is to sift through the existing literature, looking for universal AI ethics principles. The hope might be that if we find universal principles, they could guide the development and evaluation of AI systems everywhere. Organizations that develop AI systems could focus on operationalizing them. Those who evaluate AI systems, such as investors, regulators, auditors, and consumers, could examine AI systems based on these principles. I advise against this approach. In this article, I explain why it is unlikely that universal AI ethics principles will be found and I discuss reasons to avoid using dominant trends as default. Instead, I suggest that each organization should articulate its own AI ethics principles, and I sketch ways to do so responsibly. Enjoy! submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [D] Are multilayer perceptrons reversible?
    I have a multilayer perceptron that maps input X to output Y. Three hidden layers, with "tanh" activations in each, except the layer with linear (identity) activation leading the output Y. Can I reverse the network such that it would map Y to X? submitted by /u/boleslev [link] [comments]  ( 1 min )
    [D] Anyone working on Explanable AI?
    I am currently working at ML@Uber Is anyone here dealing with the problem of explanable AI? i.e. how are you able to understand and interpret predictions made by your machine learning models. Anyone here facing this problem or already have solved this problem? submitted by /u/gauravapiscean [link] [comments]  ( 1 min )
    [D] GCP compute enging pricing question
    I'm currently doing a little personal project which involves training a VQVAE (basically a conv NN and a variational autoencoder). I've got it all up and running after a few weeks of figuring things out and I'm finding that I'm being rinsed on costs. My basic setup is a c2-standard-4 VM (4 v-cpus and 16GB ram) and a 100GB disk mounted, I'm also spinning down the VM when not in use so I don't get a discount for continual usage. I started with an n1-standard-1 but found myself quickly running out of RAM, which still happens if my batch size is too high but less frequently. The disk is necessary as I'm dealing with large datasets but doesn't seem to be costing me much. As of right now my costs are about $4.5 per day, which obviously isn't that high, but for a personal project the costs soon rack up if I'm running this for days at a time every week. On my current setup I can run a batch size of 8, any higher than this and I run out of memory, and an epoch takes about 5 hours. The point I'm struggling with at the moment is can I be better optimising CPU vs runtime, i.e. is it worth swapping to slower/fewer but cheaper CPUs, or will it cost me just as much because they run longer? The other thing I'm not sure about is if I can gain much by using other platforms, my current setup isn't really that tied into the GCP ecosystem so I can switch fairly easily. I realise my questions are fairly general and everyone's setups are different, so I'm just looking for a point in the right direction if nothing else. Thanks! Edit: also if there's anywhere that can give me a decent amount of free hours I'd also be interested in that too! submitted by /u/Batteredcode [link] [comments]  ( 3 min )
    [D] Client-Side Caching Improves Feature Store Performance by 70% at DoorDash
    To enable our platform to support hundreds of data driven models and produce billions of model predictions we build a robust ML platform, feature store and prediction engine. This was only the beginning as the feature store at the heart of the platform utilized multiple TB's of memory in large Redis clusters, which needed to be optimized for cost and fast loading times for the optimal customer experience. To improve the feature store performance we implemented a caching layer but still needed to choose the best caching library, implement this solution and analyze the platform to set up experiments that would validate the new approach. I wanted to share this journey with the developer community so they can learn from my experience and how I was able to improve feature store performance by 70% at DoorDash. Please check out the article and let me know your thoughts on my approach: https://doordash.engineering/2022/05/03/how-we-applied-client-side-caching/ submitted by /u/csko7 [link] [comments]  ( 1 min )
    Twitter algorithm already open source?[D]
    I've been seeing some people talking about how elon's going to make twitter's algorithm open source but isn't it already publicly available? At least for Who To Follow WTF so I don't get why people are losing their minds over this. submitted by /u/lehmanmafia [link] [comments]  ( 2 min )
    [R] [D] SERF activation function - improving Swish
    https://arxiv.org/abs/2108.09598 Paper looks very promising, what do you think? Anyone tried SERF yet? submitted by /u/Shronnin [link] [comments]  ( 1 min )
    Sound clip Recommendation Model [D]
    Hello everyone! So I am starting a project where I need to build a model that can identify segments of audio that detect “problems with call center agents”. I have around 65,000 pieces of audio all that have had segments that have been labeled. I also have the transcribed text and the exact times the where the audio was segmented. I am thinking of starting with Data2Vec. I would like to hear your opinions on how you would approach this. Thanks in advance! submitted by /u/JS-AI [link] [comments]  ( 1 min )
    [D] Why are graph neural networks applied to non-graph structured data?
    Recently I have come across several papers that use graph neural networks for processing images, audio and video data. I can understand why GNNs are needed for data that have a graph structure (like molecules or traffic network etc). But for images and audio, what is the difference between using GNNs versus normal DNN? submitted by /u/Far_Conversation_445 [link] [comments]  ( 2 min )
    [D] Would an ML Ops platform be useful?
    Hey folk, I was wondering, would a ML Ops platform be something the community would embrace? A tool where from a CLI or GUI you can easily deploy models in a productive environment, or expose them as microservices? Something that solves the infrastructure bits and configs is what I wonder.. Or do people prefer to simply handle their own infra or manage on premises machines? submitted by /u/zedrakk [link] [comments]  ( 1 min )
    [D] Handling Missing Data or Encoding/Scaling. What comes first?
    Just out of curiosity, I just want to know whether you handle missing data before scaling/encoding of the dataset or after it. Thank you, submitted by /u/trj_flash75 [link] [comments]  ( 1 min )
    [R] Meta is releasing a 175B parameter language model
    submitted by /u/StellaAthena [link] [comments]  ( 2 min )
  • Open

    Is it correct to say that a nn with n layers is an nth order approximation?
    submitted by /u/HasFiveVowels [link] [comments]
    Stanford has trained AI to classify proteins
    submitted by /u/aidev2040 [link] [comments]
    Hi, im looking to build a neural network from scratch for a recommender system, that attempts to recommend anime programmes, could someone point me to relevant papers/books/resources?
    Hi, So for my final year project, I decided that im going to code a neural network from scratch for a recommender system, however I dont know the first thing about this topic so im going to be starting from the ground up. Could anyone point me in the right direction and recommend some resources etc? thanks! submitted by /u/tvvvs [link] [comments]  ( 1 min )
    CNN ERROR Inconsistent number of samples
    I am building a CNN network to detect DGA,DNS but i have encountered this problem and just don't know how to fix it. https://preview.redd.it/mnkfhb2cn8x81.png?width=719&format=png&auto=webp&s=ac1ec84b0721f43f36d0548173d12642b7ba0f26 submitted by /u/Dangerous_Intern_510 [link] [comments]
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated)
    submitted by /u/maneesh123456 [link] [comments]
  • Open

    Democratizing Diffusion Models - LDMs: High-Resolution Image Synthesis with Latent Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    Diffusion models (DMs) have a more stable training phase than GANs and less parameters than autoregressive models, yet they are just really resource intensive. The most powerful DMs require up to a 1000 V100 days to train (that’s a lot of $$$ for compute) and about a day per 1000 inference samples. The authors of Latent Diffusion Models (LDMs) pinpoint this problem to the high dimensionality of the pixel space, in which the diffusion process occurs and propose to perform it in a more compact latent space instead. In short, they achieve this feat by pertaining an autoencoder model that learns an efficient compact latent space that is perceptually equivalent to the pixel space. A DM sandwiched between the convolutional encoder-decoder is then trained inside the latent space in a more computationally-efficient way. In other words, this is a VQGAN with a DM instead of a transformer (and without a discriminator). As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/293 Blog post: https://www.casualganpapers.com/high-res-faster-diffusion-democratizing-diffusion/Latent-Disffusion-Models-explained.html Latent Diffusion Models arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
    Strange Galaxies (A.I animation + sound design)
    submitted by /u/nenomancer [link] [comments]
    Focus on the Process: Formulating AI Ethics Principles More Responsibly
    submitted by /u/regalalgorithm [link] [comments]
    1914 - People engaged in winter sports activities near Wetzlar, Hesse, Germany [Colored using AI]
    submitted by /u/pheonix_bird [link] [comments]
    Stanford has trained AI to classify proteins
    submitted by /u/aidev2040 [link] [comments]
    Common Voice Has A New Dataset
    submitted by /u/limapedro [link] [comments]
    7 Best Natural Language Processing Courses (2022) | Best NLP Courses
    submitted by /u/maneesh123456 [link] [comments]
    Robots still not around us?… — mathematicians to blame
    submitted by /u/marvelmind_robotics [link] [comments]  ( 1 min )
    A Look at Machine Learning
    submitted by /u/lutipri [link] [comments]
    Generating alternate Spongebob theme songs with OpenAI Jukebox
    submitted by /u/PF-Wang [link] [comments]
    AI fitness app
    Hi guys, recently came across a new fitness app with artificial intelligence, looks interesting and promising, and do you think AI can replace real trainers? Here's the link QR Kinestex (google.com) submitted by /u/vovayoung [link] [comments]
    Zama Open-Sources Concrete ML v0.2 To Support Data Scientists Without Any Prior Cryptography Knowledge To Automatically Turn Classical Machine Learning (ML) Models Into Their FHE Equivalent
    Zama is a Paris-based startup that aims to bring end-to-end encryption to AI by enabling developers to use Python to create models that run on encrypted data. In late April, the Zama Team released the public alpha release of Concrete ML, a package developed on top of Concrete Numpy. This release provides data scientists with no prior knowledge of cryptography with simple APIs for automatically converting traditional machine learning (ML) models into their FHE equivalents. One of the main goals of this version is to make using Concrete ML as convenient as possible for users of popular machine learning frameworks. Model training for linear models and trees is not reimplemented in Concrete ML, allowing researchers to utilize several variations and features of these models that the scikit-learn package supports. The team conducted a series of experiments to compare the models between scikit-learn and Concrete ML. When tested on simple 2D linear models, FHE performance was comparable to that of its unencrypted scikit-learn equivalents. However, as the number of dimensions increases in the current release, the performance of strongly quantized classifiers rapidly falls, which will be refined in future releases. On encrypted data, tree-based classifiers utilizing Concrete ML show outstanding accuracy. Running tree models that demand heavy comparisons can be easily enabled thanks to Zama’s unique approach to FHE that provides Programmable Bootstrapping. As a result, tree-based models perform as well as their scikit-learn/xgboost counterparts in FHE. This remains true even for datasets with many dimensions, and tree-based models are typically the most performant when dealing with tabular data. Data scientists can implement Decision Trees, Random Forests, and Gradient Boosted Trees after the team publishes specific deployment APIs. Continue Reading Github: https://github.com/zama-ai/concrete-ml submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI painting Marvel superheroes
    submitted by /u/notrealAI [link] [comments]  ( 1 min )
  • Open

    Alpa: Automated Model-Parallel Deep Learning
    Posted by Zhuohan Li, Student Researcher, Google Research, and Yu Emma Wang, Senior Software Engineer, Google Core Over the last several years, the rapidly growing size of deep learning models has quickly exceeded the memory capacity of single accelerators. Earlier models like BERT (with a parameter size of < 1GB) can efficiently scale across accelerators by leveraging data parallelism in which model weights are duplicated across accelerators while only partitioning and distributing the training data. However, recent large models like GPT-3 (with a parameter size of 175GB) can only scale using model parallel training, where a single model is partitioned across different devices. While model parallelism strategies make it possible to train large models, they are more complex in that th…  ( 8 min )
  • Open

    The Scientist of the Scientist
    Science has been the most important tool of humanity even before the dawn of history. Humans have used science, without even knowing that…  ( 5 min )
  • Open

    Quartal melody: Star Trek fanfare
    Intervals of a fourth, such as the interval from C to F, are common in western music, but consecutive intervals of this size are not. Quartal harmony is based on intervals of fourths, and quartal melodies use a lot of fourths, particularly consecutive fourths. Maybe the most famous quartal melody is the opening fanfare to […] Quartal melody: Star Trek fanfare first appeared on John D. Cook.  ( 2 min )
  • Open

    What are some top venues for the submission of a Reinforcement learning related paper?
    submitted by /u/blitzkreig3 [link] [comments]  ( 1 min )
    Explanation of Optimal Policies and Value Functions | Dynamic Programming
    submitted by /u/shani_786 [link] [comments]
  • Open

    Rethinking Human-in-the-Loop for Artificial Augmented Intelligence
    How do we build and evaluate an AI system for real-world applications? In most AI research, the evaluation of AI methods involves a training-validation-testing process. The experiments usually stop when the models have good testing performance on the reported datasets because real-world data distribution is assumed to be modeled by the validation and testing data. However, real-world applications are usually more complicated than a single training-validation-testing process. The biggest difference is the ever-changing data. For example, wildlife datasets change in class composition all the time because of animal invasion, re-introduction, re-colonization, and seasonal animal movements. A model trained, validated, and tested on existing datasets can easily be broken when newly collected dat…  ( 5 min )
  • Open

    ‘In the NVIDIA Studio’ Welcomes Concept Designer Yangtian Li
    This week In the NVIDIA Studio, we welcome Yangtian Li, a senior concept artist at Singularity6. Li is a concept designer and illustrator who has worked on some of the biggest video game franchises, including Call of Duty, Magic: the Gathering and Vainglory. Her artwork also appears in book illustrations and magazines. The post ‘In the NVIDIA Studio’ Welcomes Concept Designer Yangtian Li appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    [D] How to do unsupervised anomaly detection in a principled way?
    I have terabytes of tabular and image data. I want to determine what data points are novel/anomalous. I have a couple anchor data points for entities that are known and features about said entities. Effectively I want to find things that are novel relative to my anchors without any labels for if something is actually interesting and not noise to train on. Here's an analogy: I have driving footage of a reference driver who is "perfectly" driving. I also have all of the sensor data: accelerometers, where the human is looking, etc. I simulate 1000s of self-driving cars that are running their own agent models. I get data on all of these as well. I deploy an agent to the real world that is expected to do better/comparable to the reference driver while having different model parameters. I don't know how this driving agent will do in the real world. I want to choose agent instantiations that are different from the rest in some regard (anomaly) I'm constrained by how many deployments I can make. It's expensive and dangerous. ​ An issue I face with this is that there are dozens of models that will give me anomalous points, but sometimes they don't overlap at all. There's no consistency. Ideally I want to find data points that are novel in feature space (far away?), but expected to be functional... on non-labeled data. So yea... after writing this it kind of sounds like just a screwed situation, but maybe someone here has an idea/experience of how to build a system that can enable this kind of parameter selection engine (suggesting novel parameters that are functional without any labeled data). submitted by /u/memproc [link] [comments]  ( 1 min )
    [P] Transfer Learning with BERT, number of examples.
    I'm looking to make a review classifier (3 classification). It sounds like transfer learning is standard these days. But how many examples would generally be needed to make an improvement on the out of the box result? Would 1,000 be sufficient or realistically would it make to be in the 10's of thousands. Or is it a "how long is a piece of string question" edit: sorry wrong tag submitted by /u/mldude8 [link] [comments]  ( 1 min )
    [D] Present ML classification model results for non technical
    I wanna include Machine Learning classification (str target) results into a web platform, what informations that I can show and will be meaningful , I can already think of (Predicted value, Accuracy, Probability) what else I can include, graphics or anything useful to present my results for non technical people (clients)? submitted by /u/According-Promise-23 [link] [comments]  ( 1 min )
    [D] How do some people publish so much in this field?
    I'm not talking about PIs necessarily, but sometimes I search up other grad students in my department or people who email me and some of them have 10+ first author papers a year? I don't really understand how this is possible; I don't think I'm the best implementer out there but it would take me at least 2-3 months to read the literature, come up with a new research question, run experiments, iterate and write up. I mostly work with just me and my advisor though, maybe that's the difference? If you're one of these people, are you just hyper-productive? Already supervising students? Or collaborating with a ton of people and not doing a high proportion of work on any project? submitted by /u/IPvIV [link] [comments]  ( 5 min )
    [P] Pretraining dense retrievers with masked language model objective(REALM)
    Hi, I made a video explaining REALM. It is a pretraining method for dense retrievers. It uses a language model along with a retriever for pretraining. Given a random masked sentence like "Each angle in an equilateral triangle is [MASK]", the retriever gets top passages that might contain information about equilateral triangles. The passages are then passed to a language model to predict the value for each "[MASK]" token. Using this MLM objective, as model performance improves so does the quality of retrieval. A simple and effective idea for pretraining. This is the final video of our series on Open-domain question answering using dense retrievers. I will appreciate any feedback. Thanks for the support till now. https://www.youtube.com/watch?v=aQcoI1t6HOs submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] Where did MT-NLG go wrong with their scaling experiments, comparing its capabilities to PaLM?
    The MT-NLG model was 530B parameters compared to PaLM's 540B. They seem to have done things correctly from what I skimmed, However their model is neither that impressive on benchmarks, nor does it demonstrate any special capabilities. So what was the reason MT-NLG didn't work as well as expected? Is it possible it has abilities to explain jokes (on par PaLM) but they were undiscovered by the authors? Or are there any gaping flaws in how they scale the different hyperameters (heads, layers, dims etc.)? Perhaps such an analysis has already been done, but I would love to see what you guys think about why it underperformed... In such an unknown area as this, it seems that unless one scales models with multiple attempts it's hard to accurately judge when we would have reached the point where scaling laws fall off. submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [R] How to access CVPR 21 workshop extended abstracts?
    Hi everyone, I intend to submit a short paper to a 2022 CVPR workshop. I wanted to get inspiration from other CVPR short abstracts to see how they are written, their structure and so on. However, I'm really struggling a lot to find a place on the internet where I can download workshop extended abstracts from. Does anyone know where one can download the 2021 CVPR workshop short papers? submitted by /u/BigDataOverflow [link] [comments]  ( 1 min )
    [D] Train model to predict continuous variable ranging from 0 to 1 using images as input.
    I have thousands of images of structures that I am trying to use to train a NN to predict their parameters. The inputs are the 2D images of the structure, and the output is the 1 parameter I am trying to predict (I have 2 parameters I am trying to predict but one is fine too). I have tried using the dimensions of the structure to predict the parameters but the error was too high. What do you recommend I do for this? Links are appreciated submitted by /u/ftority [link] [comments]  ( 1 min )
    [D] How do you test (unit, integration) your Machine Learning models/pipelines?
    First things first, do you test them at all? Practices can differ across companies.. Secondly, I believe that testing process can differ based on the model use case (CV, NLP..) but is there any unified set of recommendations, good practices? Can general approaches regarding unit and integration testing practices be applied to ML models/pipelines? What is your approach to this? If you feel like writing maybe you could write in comments: Area of ML that you are talking about: traditional ml algorithms, CV models, NLP models What are you testing? Is it the validity of the forward pass? (example: unit testing the shapes) Is it the validity of the training loop? (somehow) Are you testing if your model's performance is above some threshold? (example: accuracy, F1, bleu, maybe execution time?) Are you testing if your model is making correct predictions on some crucial cases Are you testing important metrics related to your dataset? (example: if it's properly standardized) Some of this things can be tested pretty easily and don't require automated tests, but it is nice to have them. Do you have some other ways to do sanity checks? Please feel free to add other aspects of ML models/pipelines which are worth testing. Looking forward to your insights! submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 2 min )
    [D] What program/software can make animations like this?
    Hi, looking for advice for software/programs that can make animations like this one - thanks! https://upload.wikimedia.org/wikipedia/commons/transcoded/9/92/Infinitely_wide_neural_network.webm/Infinitely_wide_neural_network.webm.360p.vp9.webm submitted by /u/unital [link] [comments]
    [R] A very preliminary analysis of DALL-E 2
    submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [P] The easiest way to process and tag video data
    submitted by /u/happybirthday290 [link] [comments]  ( 3 min )
    [D] How to effectively sample from high dimensional space and create data-efficient training.
    So I have been working on generating data and creating a neural network to predict deformation in meshes, given some mesh parameters like thickness, elasticity, point of force, etc. What I know for sure is these parameters that I am creating a dataset for are/will be in a range and some meshes will have the same deformation for some combined different values of these parameters. What I have tried is uniformly sample from each of these parameters but this seems very data inefficient because there might be some blind spots in the combined sampling, say deformation for a particular combination of parameters can be different and unseen. My questing is how do you efficiently sample from a large multi-dimensional space, is there a better training method that would somehow inform another network to sample efficiently? ​ I will be happy to explain more if this is somehow unclear. submitted by /u/bitemenow999 [link] [comments]  ( 1 min )
    [D] Multiple PDF Files Similarity
    Hi Everyone, I am developing one application in that I have multiple pdf files which user will upload then application will group those PDFs according to their similarity%(% entered by user) if user enters 80% it means documents with atleast 80% similarity will group in batch 1 and so on (similar pdf files will group into one batches i e. Batch 1, batch 2....). PDFs documents can be text or image or collection of both. I am using .Net Core and Angular. I tried Kmean algo but results are different everytime as an Algo is picking random files for centroids. How can i implement this? submitted by /u/ConsciousSlice4329 [link] [comments]  ( 1 min )
  • Open

    Robert Magno (Run:AI) - Building the Best AI Infrastructure Stack to Accelerate Your Data Science
    submitted by /u/Dracutela [link] [comments]
    Weekly China AI News: Beijing Issues First-Ever Driverless Robotaxi Permits; Huawei Anticipates AI Compute to Jump 500 Times by 2030; CogView2 Challenges DALL-E-2 With Better Results
    submitted by /u/trcytony [link] [comments]  ( 1 min )
    How many years away do you think we are from AI that can turn our old games into something that looks like UE5?
    To me it seems to make sense that we would begin relying more and more on AI to improve our old games. For example, ffxiv won't be getting raytracing anytime soon, but I imagine it's only a matter of time before I could download some AI-based image software (like Dall-E) that lets me tweak my gameplay experience to look much more realistic. Do you think I'm right in assuming this is the likely trajectory of AI's use in game graphics? And if the answer to 1 is yes, how far away do you think such technology is? Thank you! submitted by /u/solidwhetstone [link] [comments]  ( 1 min )
    Do you think that AI will ever be as smart as humans (AGI)? If you do think so, when or around what time period?
    Hey guys. I'm working on a project for school and would like to hear your guy's opinions as the leading AI subreddit on whether or not AGI will be achieved, and if so, when. I'd greatly appreciate your input! submitted by /u/SurroundSwimming3494 [link] [comments]  ( 3 min )
    Last Week in AI: AI helps model volcanoes, Anthropic gets $580M for more explainable AI, AI algorithms that screen for child neglect, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    What's the involvement of AI in Transforming the Media & Entertainment
    Artificial intelligence (AI) will continue to disrupt the media sector, just as it did in 2020 and 2021. AI will most likely fulfill the three critical roles of recommendation, speech recognition, and media automation in this market. Read more: Artificial Intelligence will Continue to Transform the M&E Landscape submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Artificial Intelligence Implications: The Future of Modern Wargaming! (Would love some feedback on a University blog post surrounding Wargaming and the use of AI!)
    submitted by /u/RvZz11 [link] [comments]  ( 1 min )
    5 Best Machine Learning Courses for Beginners, Advanced learn in 2022 -
    submitted by /u/maneesh123456 [link] [comments]
    How to deepfake an audio where you just change the voice but keep what’s been said.
    For example: The scene from Taxi Driver where the protagonist speaks with himself in front of the mirror, how to keep what he’s saying but change his voice for Morgan Freeman’s voice. submitted by /u/Accomplished-Door-61 [link] [comments]  ( 1 min )
    Web Scraping with Python - Learning the Basics | Rubik's Code
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Spoofing detector using YoloV4 Tiny 3L
    submitted by /u/Gloomy_Recognition_4 [link] [comments]  ( 1 min )
    AI2 Open-Sources ‘LM-Debugger’: An Interactive Tool For Inspection And Intervention In Transformer-Based Language Models
    In natural language processing, a language model is a probabilistic statistical model that calculates the likelihood of a specific sequence of words appearing in a phrase based on the preceding words. As a result, it’s common in predictive text input systems, speech recognition, machine translation, and spelling correction, among other applications. They are a method of converting qualitative text information into quantitative data that machines can interpret. Modern NLP models rely on transformer-based language models (LMs). However, a lot more research is to be done under their fundamental prediction development process. Unclear prediction behavior becomes an obstacle for both end-users who don’t comprehend why a model generates certain predictions and developers who want to diagnose or fix model behavior. A new paper published by a group of researchers from Allen Institute for AI, Tel Aviv University, Bar-Ilan University, and the Hebrew University of Jerusalem introduces LM-Debugger, an interactive open-source tool for fine-grained interpretation and intervention in LM predictions. This work will increase the transparency of LMs. Continue Reading Paper: https://arxiv.org/abs/2204.12130 Github: https://github.com/mega002/lm-debugger submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    A new Trading AI?
    Hello Everybody. I have this idea for a trading bot but realised its more in the realm of AI. Im highly uneducated in this field and dont know what im talking about. If possible could someone please answer a few questions. I have found a super successful Day Trader that ive been learning from for a couple months. Hes supposedly "reverse engineered the markets" Ive drawn the trend lines and done the technical analysis hes taught me and its scarily accurate. My only problem is Day Tradimg takes time. So if i could have a bot understand his "science" and do it while im not there it would be a free money hack lol. Q1: How hard is it on a scale of 1-10 to develop/create a Trading AI? Q2: Could someone estimate what a project like this would cost me? Q3: Has a trading AI ever been created? I really appreciate if you read through this and i would really really appreciate answers. Thanks :) submitted by /u/just_conor12 [link] [comments]  ( 3 min )
    Poker AI Plays Itself
    submitted by /u/bluboxsw [link] [comments]
    AI News | AI Powered Robotic Boat Autonomously Cleans Harbors And Rivers | AI Cataract Detection | Machine Learning To Only Propose Molecules Which Are Synthesizable In Lab
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    Researchers From MIT and Cornell Develop STEGO (Self-Supervised Transformer With Energy-Based Graph Optimization): A Novel AI Framework That Distills Unsupervised Features Into High-Quality Discrete Semantic Labels
    Unsupervised semantic segmentation seeks to uncover and localize semantically significant categories within image corpora without any annotation. However, there are several challenges in creating annotated training data. These challenges frequently often outweigh semantic segmentation methods’ superior accuracy. Algorithms must develop features for every pixel that are both semantically relevant and compact enough to form discrete clusters to extract meaningful categories with any annotation from the training data. A team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Google, and Cornell University has achieved this by creating a machine learning model named STEGO (Self-supervised Transformer with Energy-based Graph Optimization) that surpasses previous methods by decoupling feature learning from cluster compactification. A frozen backbone makes up STEGO, and it serves as a source of learning feedback and input to the segmentation head for predicting distilled characteristics. This segmentation head is a direct feed-forward network with a ReLU activation function. Unlike earlier studies, the algorithm’s efficiency was increased without retraining or fine-tuning the backbone. The STEGO neural network retrieves global image information by pooling spatial variables in a global average. Then, based on the cosine similarity in the backbone’s feature space, a lookup table is computed for each image’s K-Nearest Neighbours. Continue Reading Paper: https://arxiv.org/pdf/2203.08414.pdf Github: https://github.com/mhamilton723/STEGO submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    What background knowledge is needed to understand traditional exploration methods?
    I can understand most traditional and current methods of reinforcement learning, however when it comes to traditional exploration methods such as UCB or PUCB I am mostly lost. I can understand the algorithms intuitively when explained, but when I go over the papers and look at the proofs/explanations there seems to be some background knowledge I am not privy to. The book Reinforcement Learning: An Introduction doesn't seem to go over these in-depth as well. BTW I have a bachelor's and master's degree in Computer Science and have taken math classes such as statistics, calculus, LA, algorithms, ect. submitted by /u/Stochastic_Machine [link] [comments]  ( 1 min )
    Help needed to understand Rllib attention model parameters
    Hi! I recently started to learn about using attention (transformers). I use rllib implementation (https://docs.ray.io/en/latest/rllib/rllib-models.html#default-model-config-settings) which follows some paper. So far I have some understanding of transformers but I still don't understand everything. When playing with parameters I saw couple which I don't understand how these affect the learning and what they exactly do ( I looked at code and paper but still don't understand). Can somebody explain what are these and how these affect the learning: 1) attention_memory_inference 2) attention_memory_training (also related to previous point: when training why attention_memory_training and input sequence are concated? https://github.com/ray-project/ray/blob/master/rllib/models/torch/attention_net.py#L91 ) Many thanks :D submitted by /u/HBlackwooder [link] [comments]  ( 1 min )
    Is this how a simple parallel environment training works in DDPG/TD3 model?
    I tried to figure out how to do parallel environment training using TD3 or DDPG but just got bits and pieces of information, so this is what I came up with. This is what I have (a Unity "ant" environment): One ant in one environment But I want this: Parallel ants in multiple environments As a result, I can learn faster because I have more data. So here's how I envision parallel training: Agents collect actions from one TD3/DDPG model. Collect observations from all agents and put them into the memory buffer. Do 1 and 2 steps until every agent finishes its episode. If the agent finished its episode ahead of other agents, it simply waits. Then comes the training DDPG/TD3 step. And repeat. Would this type of training even work? submitted by /u/Dougller [link] [comments]  ( 1 min )
    Unity with MLAgents, Isaac Gym, OpenAI Gym and other environments to experiment with reinforcement learning
    Hello, I am a master's student in computer science and I am specializing in artificial intelligence. I am approaching reinforcement learning for the first time in an intelligent robotics course, and would like to experiment with these techniques in a simulated environment. I'm looking for something that is not too complex to learn (I want to focus more on algorithm implementation and simulation rather than spending months figuring out the tool) and that gives enough freedom to modify and implement. Unity seemed interesting because a possible application of these techniques that I would like to explore is in the world of gaming and simulation, but I do not know if a solid knowledge of Unity is required to set up a decent environment before you can get into the experiments. Moreover, I don't know up to what level of detail is possible to enter in the case of MLAgents and how much is possible to customize. Do you recommend one of these that I proposed in the title or also other? Also, do you recommend some material (books, videos, courses, tutorials) to study that is a good compromise between explanation of the tool and theoretical part? Thanks in advance! submitted by /u/Parruck [link] [comments]  ( 2 min )
    creating custom environment for training RL agent
    Hi, I'm working on a project on 2d image reassembly using reinforcement learning. I want to create a custom (puzzle) environment for doing RL. Is there anyway to create a custom environment that can be integrated with gym. If so, how to create one? Thanks. submitted by /u/Praveen_Raja22 [link] [comments]  ( 1 min )
  • Open

    Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker
    Machine learning (ML) applications are complex to deploy and often require multiple ML models to serve a single inference request. A typical request may flow across multiple models with steps like preprocessing, data transformations, model selection logic, model aggregation, and postprocessing. This has led to the evolution of common design patterns such as serial inference […]  ( 15 min )
    Build a corporate credit ratings classifier using graph machine learning in Amazon SageMaker JumpStart
    Today, we’re releasing a new solution for financial graph machine learning (ML) in Amazon SageMaker JumpStart. JumpStart helps you quickly get started with ML and provides a set of solutions for the most common use cases that can be trained and deployed with just a few clicks. The new JumpStart solution (Graph-Based Credit Scoring) demonstrates […]  ( 8 min )
    Increase your content reach with automated document-to-speech conversion using Amazon AI services
    Reading the printed word opens up a world of information, imagination, and creativity. However, scanned books and documents may be difficult for people with vision impairment and learning disabilities to consume. In addition, some people prefer to listen to text-based content versus reading it. A document-to-speech solution extends the reach of digital content by giving […]  ( 8 min )
  • Open

    Do you usually build neural networks from scratch or do you use a library dedicated to it?
    I´´m curious, I´ve seen both and while the ones using libraries sound harder to make, the ones made from scratch sound slightly sloppier. For context, I mean for predator-prey simulations and such, not image to text recognition or anything that hard. submitted by /u/HesAMagicalPoney55 [link] [comments]  ( 1 min )
    [Question] Interval Branch and Bound
    Hello everyone, I'm studying the Branch and Bound algorithm, and I got a question (Reddit looks like the perfect place to ask other experienced people). I already have a working Branch and Bound algorithm but I want to adapt it to be working with intervals instead of one single value. And how do I do it? ​ On Branch and Bound I have something like: For each sample, I'm calculating the costs and then running the bab algorithm with those and my values. if (x - y < 0) then keep going with the tree else stop with the search But when adapting for Interval Branch and Bound would be something like: if ((x + ∆x) - (y+∆y) < 0) keep going with the tree else stop with the search and then I still need to check for the other bound (x - ∆x). ​ I will need to calculate twice the costs (once for each bound) and search the tree one time also for each cost? I can be saying a really big mistake, but that's why I'm here. Someone can help be clarifying my problem? Thank you for your attention. submitted by /u/Zarathos_PT [link] [comments]  ( 1 min )
    Future Computers Will Be Radically Different
    submitted by /u/keghn [link] [comments]
  • Open

    Why target ads at pregnant women
    I’m listening to a podcast interviewing Neil Richards, the author of Why Privacy Matters. Richards makes a couple interesting points about the infamous example of Target figuring out which women were pregnant based on their purchase history. First, pregnancy is a point at which women are open to trying new things. So if a company […] Why target ads at pregnant women first appeared on John D. Cook.  ( 1 min )
    Curiously simple approximations
    As I’ve written about here and elsewhere, the following simple approximations are fairly accurate. log10 x ≈ (x-1)/(x+1) loge x ≈ 2 (x – 1)/(x + 1) log2 x ≈ 3(x – 1)/(x + 1) It’s a little surprising that each is as accurate as it is, but it’s also surprising that the approximations for […] Curiously simple approximations first appeared on John D. Cook.  ( 1 min )
  • Open

    Mown Away: Startup Rolls Out Autonomous Lawnmower With Cutting-Edge Tech
    Jack Morrison and Isaac Roberts, co-founders of Replica Labs, were restless two years after their 3D vision startup was acquired, seeking another adventure. Then, in 2018, when Morrison was mowing his lawn, it struck him: autonomous lawn mowers. The two, along with Davis Foster, co-founded Scythe Robotics. The company, based in Boulder, Colo., has a Read article > The post Mown Away: Startup Rolls Out Autonomous Lawnmower With Cutting-Edge Tech appeared first on NVIDIA Blog.  ( 3 min )
    Meet the Omnivore: 3D Artist Creates Towering Work With NVIDIA Omniverse
    Edward McEvenue grew up making claymations in LEGO towns. Now, he’s creating photorealistic animations in virtual cities, drawing on more than a decade of experience in the motion graphics industry. The post Meet the Omnivore: 3D Artist Creates Towering Work With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Business Model Transformation: Keys to Monetizing the Edge
    I was conducting a Tech Talk at a client when I mentioned the astronomical growth of data at “the edge”; that data creation at the edge is growing almost as fast as that in the cloud according to IDC. This sent a noticeable ripple across the executive team audience.  The executives immediately began debating “How… Read More »Business Model Transformation: Keys to Monetizing the Edge The post Business Model Transformation: Keys to Monetizing the Edge appeared first on Data Science Central.  ( 5 min )
    Are PDF Documents a Thing of the Past?
    There has been many articles predicting the death of the PDF format, invented in 1993. Some of these articles are 10 years old: you can find them by googling “death of PDF”. With the advent of fluid or liquid layout design on almost every website, widespread Internet browsing on small devices (cell phones), notebooks combining… Read More »Are PDF Documents a Thing of the Past? The post Are PDF Documents a Thing of the Past? appeared first on Data Science Central.  ( 4 min )
  • Open

    10 Ways Technology is Changing Healthcare: How Innovation is Impacting the Medical Industry
    Source: Managed Healthcare Executive  ( 6 min )
  • Open

    Using Kaggle in Machine Learning Projects
    You’ve probably heard of Kaggle data science competitions, but did you know that Kaggle has many other features that can help you with your next machine learning project? For people looking for datasets for their next machine learning project, Kaggle allows you to access public datasets by others and share your own datasets. For those […] The post Using Kaggle in Machine Learning Projects appeared first on Machine Learning Mastery.  ( 7 min )
  • Open

    On the Optimization of Margin Distribution. (arXiv:2204.14118v1 [cs.LG])
    Margin has played an important role on the design and analysis of learning algorithms during the past years, mostly working with the maximization of the minimum margin. Recent years have witnessed the increasing empirical studies on the optimization of margin distribution according to different statistics such as medium margin, average margin, margin variance, etc., whereas there is a relative paucity of theoretical understanding. In this work, we take one step on this direction by providing a new generalization error bound, which is heavily relevant to margin distribution by incorporating ingredients such as average margin and semi-variance, a new margin statistics for the characterization of margin distribution. Inspired by the theoretical findings, we propose the MSVMAv, an efficient approach to achieve better performance by optimizing margin distribution in terms of its empirical average margin and semi-variance. We finally conduct extensive experiments to show the superiority of the proposed MSVMAv approach.  ( 2 min )
    Intuitive Shape Editing in Latent Space. (arXiv:2111.12488v2 [cs.CV] UPDATED)
    The use of autoencoders for shape editing or generation through latent space manipulation suffers from unpredictable changes in the output shape. Our autoencoder-based method enables intuitive shape editing in latent space by disentangling latent sub-spaces into style variables and control points on the surface that can be manipulated independently. The key idea is adding a Lipschitz-type constraint to the loss function, i.e. bounding the change of the output shape proportionally to the change in latent space, leading to interpretable latent space representations. The control points on the surface that are part of the latent code of an object can then be freely moved, allowing for intuitive shape editing directly in latent space. We evaluate our method by comparing to state-of-the-art data-driven shape editing methods. We further demonstrate the expressiveness of our learned latent space by leveraging it for unsupervised part segmentation.  ( 2 min )
    Network Topology Optimization via Deep Reinforcement Learning. (arXiv:2204.14133v1 [cs.NI])
    Topology impacts important network performance metrics, including link utilization, throughput and latency, and is of central importance to network operators. However, due to the combinatorial nature of network topology, it is extremely difficult to obtain an optimal solution, especially since topology planning in networks also often comes with management-specific constraints. As a result, local optimization with hand-tuned heuristic methods from human experts are often adopted in practice. Yet, heuristic methods cannot cover the global topology design space while taking into account constraints, and cannot guarantee to find good solutions. In this paper, we propose a novel deep reinforcement learning (DRL) algorithm, called Advantage Actor Critic-Graph Searching (A2C-GS), for network topology optimization. A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search. A2C-GS can efficiently search over large topology space and output topology with satisfying performance. We conduct a case study based on a real network scenario, and our experimental results demonstrate the superior performance of A2C-GS in terms of both efficiency and performance.  ( 2 min )
    Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection. (arXiv:2204.13849v1 [cs.CV])
    A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in developing a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model. In this paper, we aim to explore ways to address this problem through a sim2real transfer approach in medical image fields. To build a platform to evaluate the performance of sim2real transfer methods in the field of medical imaging, we construct a benchmark dataset that consists of $101$ chest X-images with difficult-to-identify pneumonia lesions judged by an experienced radiologist and a simulator based on fractal Perlin noise and the X-ray principle for generating pseudo pneumonia lesions. We then develop a novel domain randomization method, called Goldilocks-curriculum domain randomization (GDR) and evaluate our method in this platform.  ( 2 min )
    A review of Federated Learning in Intrusion Detection Systems for IoT. (arXiv:2204.12443v2 [cs.CR] UPDATED)
    Intrusion detection systems are evolving into intelligent systems that perform data analysis searching for anomalies in their environment. The development of deep learning technologies opened the door to build more complex and effective threat detection models. However, training those models may be computationally infeasible in most Internet of Things devices. Current approaches rely on powerful centralized servers that receive data from all their parties -- violating basic privacy constraints and substantially affecting response times and operational costs due to the huge communication overheads. To mitigate these issues, Federated Learning emerged as a promising approach where different agents collaboratively train a shared model, neither exposing training data to others nor requiring a compute-intensive centralized infrastructure. This paper focuses on the application of Federated Learning approaches in the field of Intrusion Detection. Both technologies are described in detail and current scientific progress is reviewed and categorized. Finally, the paper highlights the limitations present in recent works and presents some future directions for this technology.
    Forecasting large-scale circulation regimes using deformable convolutional neural networks and global spatiotemporal climate data. (arXiv:2202.04964v2 [cs.LG] UPDATED)
    Classifying the state of the atmosphere into a finite number of large-scale circulation regimes is a popular way of investigating teleconnections, the predictability of severe weather events, and climate change. Here, we investigate a supervised machine learning approach based on deformable convolutional neural networks (deCNNs) and transfer learning to forecast the North Atlantic-European weather regimes during extended boreal winter for 1 to 15 days into the future. We apply state-of-the-art interpretation techniques from the machine learning literature to attribute particular regions of interest or potential teleconnections relevant for any given weather cluster prediction or regime transition. We demonstrate superior forecasting performance relative to several classical meteorological benchmarks, as well as logistic regression and random forests. Due to its wider field of view, we also observe deCNN achieving considerably better performance than regular convolutional neural networks at lead times beyond 5-6 days. Finally, we find transfer learning to be of paramount importance, similar to previous data-driven atmospheric forecasting studies.
    SleepPPG-Net: a deep learning algorithm for robust sleep staging from continuous photoplethysmography. (arXiv:2202.05735v4 [cs.LG] UPDATED)
    Introduction: Sleep staging is an essential component in the diagnosis of sleep disorders and management of sleep health. It is traditionally measured in a clinical setting and requires a labor-intensive labeling process. We hypothesize that it is possible to perform robust 4-class sleep staging using the raw photoplethysmography (PPG) time series and modern advances in deep learning (DL). Methods: We used two publicly available sleep databases that included raw PPG recordings, totalling 2,374 patients and 23,055 hours. We developed SleepPPG-Net, a DL model for 4-class sleep staging from the raw PPG time series. SleepPPG-Net was trained end-to-end and consists of a residual convolutional network for automatic feature extraction and a temporal convolutional network to capture long-range contextual information. We benchmarked the performance of SleepPPG-Net against models based on the best-reported state-of-the-art (SOTA) algorithms. Results: When benchmarked on a held-out test set, SleepPPG-Net obtained a median Cohen's Kappa ($\kappa$) score of 0.75 against 0.69 for the best SOTA approach. SleepPPG-Net showed good generalization performance to an external database, obtaining a $\kappa$ score of 0.74 after transfer learning. Perspective: Overall, SleepPPG-Net provides new SOTA performance. In addition, performance is high enough to open the path to the development of wearables that meet the requirements for usage in clinical applications such as the diagnosis and monitoring of obstructive sleep apnea.
    Exploration and Exploitation in Federated Learning to Exclude Clients with Poisoned Data. (arXiv:2204.14020v1 [cs.DC])
    Federated Learning (FL) is one of the hot research topics, and it utilizes Machine Learning (ML) in a distributed manner without directly accessing private data on clients. However, FL faces many challenges, including the difficulty to obtain high accuracy, high communication cost between clients and the server, and security attacks related to adversarial ML. To tackle these three challenges, we propose an FL algorithm inspired by evolutionary techniques. The proposed algorithm groups clients randomly in many clusters, each with a model selected randomly to explore the performance of different models. The clusters are then trained in a repetitive process where the worst performing cluster is removed in each iteration until one cluster remains. In each iteration, some clients are expelled from clusters either due to using poisoned data or low performance. The surviving clients are exploited in the next iteration. The remaining cluster with surviving clients is then used for training the best FL model (i.e., remaining FL model). Communication cost is reduced since fewer clients are used in the final training of the FL model. To evaluate the performance of the proposed algorithm, we conduct a number of experiments using FEMNIST dataset and compare the result against the random FL algorithm. The experimental results show that the proposed algorithm outperforms the baseline algorithm in terms of accuracy, communication cost, and security.
    The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs. (arXiv:2110.07409v3 [math.OC] UPDATED)
    We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we estimate the number of critical points and use the polynomial programming description of reward maximization to solve a navigation problem in a grid world.
    Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization. (arXiv:2203.13167v3 [cs.CV] UPDATED)
    In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a) leads to a better overall accuracy, (b) helps enhance the rigidity by maintaining competitive performances. Furthermore, we identify the limitation imposed by the symmetric nature of regularization losses. To alleviate this, we propose an asymmetric variant and apply it to the pooled output distillation (POD) loss adapted for ViTs. Our experiments confirm that introducing asymmetry to POD boosts its plasticity while retaining stability across (a) and (b). Moreover, we acknowledge low forgetting measures for all the compared methods, indicating that ViTs might be naturally inclined continual learner
    Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps. (arXiv:2203.08189v2 [cs.LG] UPDATED)
    While it is not generally reflected in the `nice' datasets used for benchmarking machine learning algorithms, the real-world is full of processes that would be best described as many-to-many. That is, a single input can potentially yield many different outputs (whether due to noise, imperfect measurement, or intrinsic stochasticity in the process) and many different inputs can yield the same output (that is, the map is not injective). For example, imagine a sentiment analysis task where, due to linguistic ambiguity, a single statement can have a range of different sentiment interpretations while at the same time many distinct statements can represent the same sentiment. When modeling such a multivalued function $f: X \rightarrow Y$, it is frequently useful to be able to model the distribution on $f(x)$ for specific input $x$ as well as the distribution on fiber $f^{-1}(y)$ for specific output $y$. Such an analysis helps the user (i) better understand the variance intrinsic to the process they are studying and (ii) understand the range of specific input $x$ that can be used to achieve output $y$. Following existing work which used a fiber bundle framework to better model many-to-one processes, we describe how morphisms of fiber bundles provide a template for building models which naturally capture the structure of many-to-many processes.  ( 2 min )
    Reducing Neural Architecture Search Spaces with Training-Free Statistics and Computational Graph Clustering. (arXiv:2204.14103v1 [cs.LG])
    The computational demands of neural architecture search (NAS) algorithms are usually directly proportional to the size of their target search spaces. Thus, limiting the search to high-quality subsets can greatly reduce the computational load of NAS algorithms. In this paper, we present Clustering-Based REDuction (C-BRED), a new technique to reduce the size of NAS search spaces. C-BRED reduces a NAS space by clustering the computational graphs associated with its architectures and selecting the most promising cluster using proxy statistics correlated with network accuracy. When considering the NAS-Bench-201 (NB201) data set and the CIFAR-100 task, C-BRED selects a subset with 70% average accuracy instead of the whole space's 64% average accuracy.  ( 2 min )
    Abstraction for Deep Reinforcement Learning. (arXiv:2202.05839v3 [cs.LG] UPDATED)
    We characterise the problem of abstraction in the context of deep reinforcement learning. Various well established approaches to analogical reasoning and associative memory might be brought to bear on this issue, but they present difficulties because of the need for end-to-end differentiability. We review developments in AI and machine learning that could facilitate their adoption.
    Unsolved Problems in ML Safety. (arXiv:2109.13916v4 [cs.LG] UPDATED)
    Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.
    KnowAugNet: Multi-Source Medical Knowledge Augmented Medication Prediction Network with Multi-Level Graph Contrastive Learning. (arXiv:2204.11736v2 [cs.AI] UPDATED)
    Predicting medications is a crucial task in many intelligent healthcare systems. It can assist doctors in making informed medication decisions for patients according to electronic medical records (EMRs). However, medication prediction is a challenging data mining task due to the complex relations between medical codes. Most existing studies focus on utilizing inherent relations between homogeneous codes of medical ontology graph to enhance their representations using supervised methods, and few studies pay attention to the valuable relations between heterogeneous or homogeneous medical codes from history EMRs, which further limits the prediction performance and application scenarios. Therefore, to address these limitations, this paper proposes KnowAugNet, a multi-sourced medical knowledge augmented medication prediction network which can fully capture the diverse relations between medical codes via multi-level graph contrastive learning framework. Specifically, KnowAugNet first leverages the graph contrastive learning using graph attention network as the encoder to capture the implicit relations between homogeneous medical codes from the medical ontology graph and obtains the knowledge augmented medical codes embedding vectors. Then, it utilizes the graph contrastive learning using a weighted graph convolutional network as the encoder to capture the correlative relations between homogeneous or heterogeneous medical codes from the constructed medical prior relation graph and obtains the relation augmented medical codes embedding vectors. Finally, the augmented medical codes embedding vectors and the supervised medical codes embedding vectors are retrieved and input to the sequential learning network to capture the temporal relations of medical codes and predict medications for patients.  ( 2 min )
    Rockafellian Relaxation in Optimization under Uncertainty: Asymptotically Exact Formulations. (arXiv:2204.04762v2 [math.OC] UPDATED)
    In practice, optimization models are often prone to unavoidable inaccuracies due to lack of data and dubious assumptions. Traditionally, this placed special emphasis on risk-based and robust formulations, and their focus on "conservative" decisions. We develop, in contrast, an "optimistic" framework based on Rockafellian relaxations in which optimization is conducted not only over the original decision space but also jointly with a choice of model perturbation. The framework enables us to address challenging problems with ambiguous probability distributions from the areas of two-stage stochastic optimization without relatively complete recourse, probability functions lacking continuity properties, expectation constraints, and outlier analysis. We are also able to circumvent the fundamental difficulty in stochastic optimization that convergence of distributions fails to guarantee convergence of expectations. The framework centers on the novel concepts of exact and asymptotically exact Rockafellians, with interpretations of "negative" regularization emerging in certain settings. We illustrate the role of Phi-divergence, examine rates of convergence under changing distributions, and explore extensions to first-order optimality conditions. The main development is free of assumptions about convexity, smoothness, and even continuity of objective functions.
    RAMP-CNN: A Novel Neural Network for Enhanced Automotive Radar Object Recognition. (arXiv:2011.08981v2 [eess.SP] UPDATED)
    Millimeter-wave radars are being increasingly integrated into commercial vehicles to support new advanced driver-assistance systems by enabling robust and high-performance object detection, localization, as well as recognition - a key component of new environmental perception. In this paper, we propose a novel radar multiple-perspectives convolutional neural network (RAMP-CNN) that extracts the location and class of objects based on further processing of the range-velocity-angle (RVA) heatmap sequences. To bypass the complexity of 4D convolutional neural networks (NN), we propose to combine several lower-dimension NN models within our RAMP-CNN model that nonetheless approaches the performance upper-bound with lower complexity. The extensive experiments show that the proposed RAMP-CNN model achieves better average recall and average precision than prior works in all testing scenarios. Besides, the RAMP-CNN model is validated to work robustly under nighttime, which enables low-cost radars as a potential substitute for pure optical sensing under severe conditions.  ( 2 min )
    Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation. (arXiv:2202.02440v2 [cs.CV] UPDATED)
    In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, RoomNav, ViewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
    Human's Role in-the-Loop. (arXiv:2204.14192v1 [cs.DB])
    Data integration has been recently challenged by the need to handle large volumes of data, arriving at high velocity from a variety of sources, which demonstrate varying levels of veracity. This challenging setting, often referred to as big data, renders many of the existing techniques, especially those that are human-intensive, obsolete. Big data also produces technological advancements such as Internet of things, cloud computing, and deep learning, and accordingly, provides a new, exciting, and challenging research agenda. Given the availability of data and the improvement of machine learning techniques, this blog discusses the respective roles of humans and machines in achieving cognitive tasks in matching, aiming to determine whether traditional roles of humans and machines are subject to change. Such investigation, we believe, will pave a way to better utilize both human and machine resources in new and innovative manners. We shall discuss two possible modes of change, namely humans out and humans in. Humans out aim at exploring out-of-the-box latent matching reasoning using machine learning algorithms when attempting to overpower human matcher performance. Pursuing out-of-the-box thinking, machine and deep learning can be involved in matching. Humans in explores how to better involve humans in the matching loop by assigning human matchers with a symmetric role to algorithmic matcher in the matching process.  ( 2 min )
    Learning Anisotropic Interaction Rules from Individual Trajectories in a Heterogeneous Cellular Population. (arXiv:2204.14141v1 [q-bio.QM])
    Interacting particle system (IPS) models have proven to be highly successful for describing the spatial movement of organisms. However, it has proven challenging to infer the interaction rules directly from data. In the field of equation discovery, the Weak form Sparse Identification of Nonlinear Dynamics (WSINDy) methodology has been shown to be very computationally efficient for identifying the governing equations of complex systems, even in the presence of substantial noise. Motivated by the success of IPS models to describe the spatial movement of organisms, we develop WSINDy for second order IPSs to model the movement of communities of cells. Specifically, our approach learns the directional interaction rules that govern the dynamics of a heterogeneous population of migrating cells. Rather than aggregating cellular trajectory data into a single best-fit model, we learn the models for each individual cell. These models can then be efficiently classified according to the active classes of interactions present in the model. From these classifications, aggregated models are constructed hierarchically to simultaneously identify different species of cells present in the population and determine best-fit models for each species. We demonstrate the efficiency and proficiency of the method on several test scenarios, motivated by common cell migration experiments.
    Analysing the Influence of Attack Configurations on the Reconstruction of Medical Images in Federated Learning. (arXiv:2204.13808v1 [eess.IV])
    The idea of federated learning is to train deep neural network models collaboratively and share them with multiple participants without exposing their private training data to each other. This is highly attractive in the medical domain due to patients' privacy records. However, a recently proposed method called Deep Leakage from Gradients enables attackers to reconstruct data from shared gradients. This study shows how easy it is to reconstruct images for different data initialization schemes and distance measures. We show how data and model architecture influence the optimal choice of initialization scheme and distance measure configurations when working with single images. We demonstrate that the choice of initialization scheme and distance measure can significantly increase convergence speed and quality. Furthermore, we find that the optimal attack configuration depends largely on the nature of the target image distribution and the complexity of the model architecture.
    Pre-training helps Bayesian optimization too. (arXiv:2109.08215v3 [cs.LG] UPDATED)
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Theoretically, we show a bounded regret of BO with pre-trained priors. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.
    Data+Shift: Supporting visual investigation of data distribution shifts by data scientists. (arXiv:2204.14025v1 [cs.LG])
    Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
    H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness. (arXiv:2204.13852v1 [cs.LG])
    The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms.  ( 2 min )
    Controlled Generation of Unseen Faults for Partial and OpenSet&Partial Domain Adaptation. (arXiv:2204.14068v1 [cs.LG])
    New operating conditions can result in a performance drop of fault diagnostics models due to the domain gap between the training and the testing data distributions. While several domain adaptation approaches have been proposed to overcome such domain shifts, their application is limited if the label spaces of the two domains are not congruent. To improve the transferability of the trained models, particularly in setups where only the healthy data class is shared between the two domains, we propose a new framework based on a Wasserstein GAN for Partial and OpenSet&Partial domain adaptation. The main contribution is the controlled fault data generation that enables to generate unobserved fault types and severity levels in the target domain by having only access to the healthy samples in the target domain and faulty samples in the source domain. To evaluate the ability of the proposed method to bridge domain gaps in different domain adaption settings, we conduct Partial as well as OpenSet&Partial domain adaptation experiments on two bearing fault diagnostics case studies. The results show the versatility of the framework and that the synthetically generated fault data helps bridging the domain gaps, especially in instances where the domain gap is large.
    Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs. (arXiv:2204.05231v2 [cs.IR] UPDATED)
    Recently, semantic search has been successfully applied to e-commerce product search and the learned semantic space(s) for query and product encoding are expected to generalize to unseen queries or products. Yet, whether generalization can conveniently emerge has not been thoroughly studied in the domain thus far. In this paper, we examine several general-domain and domain-specific pre-trained Roberta variants and discover that general-domain fine-tuning does not help generalization, which aligns with the discovery of prior art. Proper domain-specific fine-tuning with clickstream data can lead to better model generalization, based on a bucketed analysis of a publicly available manual annotated query-product pair da  ( 2 min )
    Recommendations on test datasets for evaluating AI solutions in pathology. (arXiv:2204.14226v1 [eess.IV])
    Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help regulatory agencies and end users verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.  ( 2 min )
    Semi-Discrete Optimal Transport: Hardness, Regularization and Numerical Solution. (arXiv:2103.06263v2 [cs.LG] UPDATED)
    Semi-discrete optimal transport problems, which evaluate the Wasserstein distance between a discrete and a generic (possibly non-discrete) probability measure, are believed to be computationally hard. Even though such problems are ubiquitous in statistics, machine learning and computer vision, however, this perception has not yet received a theoretical justification. To fill this gap, we prove that computing the Wasserstein distance between a discrete probability measure supported on two points and the Lebesgue measure on the standard hypercube is already #P-hard. This insight prompts us to seek approximate solutions for semi-discrete optimal transport problems. We thus perturb the underlying transportation cost with an additive disturbance governed by an ambiguous probability distribution, and we introduce a distributionally robust dual optimal transport problem whose objective function is smoothed with the most adverse disturbance distributions from within a given ambiguity set. We further show that smoothing the dual objective function is equivalent to regularizing the primal objective function, and we identify several ambiguity sets that give rise to several known and new regularization schemes. As a byproduct, we discover an intimate relation between semi-discrete optimal transport problems and discrete choice models traditionally studied in psychology and economics. To solve the regularized optimal transport problems efficiently, we use a stochastic gradient descent algorithm with imprecise stochastic gradient oracles. A new convergence analysis reveals that this algorithm improves the best known convergence guarantee for semi-discrete optimal transport problems with entropic regularizers.
    Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection. (arXiv:2111.07819v3 [cs.CL] UPDATED)
    A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., news) or platforms (e.g., Twitter). The methods' capabilities to generalize were largely unclear so far. We evaluate fifteen Transformer-based models on five COVID-19 misinformation datasets that include social media posts, news articles, and scientific papers to fill this gap. We show tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones. Our study provides a realistic assessment of models for detecting COVID-19 misinformation. We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.
    To Trust or Not To Trust Prediction Scores for Membership Inference Attacks. (arXiv:2111.09076v2 [cs.LG] UPDATED)
    Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model's prediction scores - the probability of each output given some input - following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.
    Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain. (arXiv:2204.13883v1 [eess.AS])
    The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and labour-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to the dynamic nature of real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker and gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference time, as well as the tedious pre-calibration process required for new maskers. The proposed system was validated on a large-scale dataset of subjective responses to augmented soundscapes with more than 440 participants, ensuring the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level.  ( 2 min )
    Wide and Deep Neural Networks Achieve Optimality for Classification. (arXiv:2204.14126v1 [cs.LG])
    While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
    A Framework for Constructing Machine Learning Models with Feature Set Optimisation for Evapotranspiration Partitioning. (arXiv:2204.14142v1 [cs.LG])
    A deeper understanding of the drivers of evapotranspiration and the modelling of its constituent parts (evaporation and transpiration) could be of significant importance to the monitoring and management of water resources globally over the coming decades. In this work, we developed a framework to identify the best performing machine learning algorithm from a candidate set, select optimal predictive features as well as ranking features in terms of their im- portance to predictive accuracy. Our experiments used 3 separate feature sets across 4 wetland sites as input into 8 candidate machine learning algorithms, providing 96 sets of experimental configurations. Given this high number of parameters, our results show strong evidence that there is no singularly optimal machine learning algorithm or feature set across all of the wetland sites studied despite their similarities. A key finding discovered when examining feature importance is that methane flux, a feature whose relationship with evapotranspiration is not generally examined, may contribute to further biophysical process understanding.
    Adversarially-regularized mixed effects deep learning (ARMED) models for improved interpretability, performance, and generalization on clustered data. (arXiv:2202.11783v3 [cs.LG] UPDATED)
    Natural science datasets frequently violate assumptions of independence. Samples may be clustered (e.g. by study site, subject, or experimental batch), leading to spurious associations, poor model fitting, and confounded analyses. While largely unaddressed in deep learning, this problem has been handled in the statistics community through mixed effects models, which separate cluster-invariant fixed effects from cluster-specific random effects. We propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED) models through non-intrusive additions to existing neural networks: 1) an adversarial classifier constraining the original model to learn only cluster-invariant features, 2) a random effects subnetwork capturing cluster-specific features, and 3) an approach to apply random effects to clusters unseen during training. We apply ARMED to dense, convolutional, and autoencoder neural networks on 4 applications including simulated nonlinear data, dementia prognosis and diagnosis, and live-cell image analysis. Compared to prior techniques, ARMED models better distinguish confounded from true associations in simulations and learn more biologically plausible features in clinical applications. They can also quantify inter-cluster variance and visualize cluster effects in data. Finally, ARMED improves accuracy on data from clusters seen during training (up to 28% vs. conventional models) and generalization to unseen clusters (up to 9% vs. conventional models).
    COVID-Net US-X: Enhanced Deep Neural Network for Detection of COVID-19 Patient Cases from Convex Ultrasound Imaging Through Extended Linear-Convex Ultrasound Augmentation Learning. (arXiv:2204.13851v1 [eess.IV])
    As the global population continues to face significant negative impact by the on-going COVID-19 pandemic, there has been an increasing usage of point-of-care ultrasound (POCUS) imaging as a low-cost and effective imaging modality of choice in the COVID-19 clinical workflow. A major barrier with widespread adoption of POCUS in the COVID-19 clinical workflow is the scarcity of expert clinicians that can interpret POCUS examinations, leading to considerable interest in deep learning-driven clinical decision support systems to tackle this challenge. A major challenge to building deep neural networks for COVID-19 screening using POCUS is the heterogeneity in the types of probes used to capture ultrasound images (e.g., convex vs. linear probes), which can lead to very different visual appearances. In this study, we explore the impact of leveraging extended linear-convex ultrasound augmentation learning on producing enhanced deep neural networks for COVID-19 assessment, where we conduct data augmentation on convex probe data alongside linear probe data that have been transformed to better resemble convex probe data. Experimental results using an efficient deep columnar anti-aliased convolutional neural network designed via a machined-driven design exploration strategy (which we name COVID-Net US-X) show that the proposed extended linear-convex ultrasound augmentation learning significantly increases performance, with a gain of 5.1% in test accuracy and 13.6% in AUC.
    PokeBNN: A Binary Pursuit of Lightweight Accuracy. (arXiv:2112.00133v2 [cs.LG] UPDATED)
    Optimization of Top-1 ImageNet promotes enormous networks that may be impractical in inference settings. Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function. We apply it to ResNet-50 and optimize ResNet's initial convolutional layer which is hard to binarize. We name the resulting network family PokeBNN. These techniques are chosen to yield favorable improvements in both top-1 accuracy and the network's cost. In order to enable joint optimization of the cost together with accuracy, we define arithmetic computation effort (ACE), a hardware- and energy-inspired cost metric for quantized and binarized networks. We also identify a need to optimize an under-explored hyper-parameter controlling the binarization gradient approximation. We establish a new, strong state-of-the-art (SOTA) on top-1 accuracy together with commonly-used CPU64 cost, ACE cost and network size metrics. ReActNet-Adam, the previous SOTA in BNNs, achieved a 70.5% top-1 accuracy with 7.9 ACE. A small variant of PokeBNN achieves 70.5% top-1 with 2.6 ACE, more than 3x reduction in cost; a larger PokeBNN achieves 75.6% top-1 with 7.8 ACE, more than 5% improvement in accuracy without increasing the cost. PokeBNN implementation in JAX/Flax and reproduction instructions are available in AQT repository: https://github.com/google/aqt
    Altruist: Argumentative Explanations through Local Interpretations of Predictive Models. (arXiv:2010.07650v2 [cs.LG] UPDATED)
    Explainable AI is an emerging field providing solutions for acquiring insights into automated systems' rationale. It has been put on the AI map by suggesting ways to tackle key ethical and societal issues. Existing explanation techniques are often not comprehensible to the end user. Lack of evaluation and selection criteria also makes it difficult for the end user to choose the most suitable technique. In this study, we combine logic-based argumentation with Interpretable Machine Learning, introducing a preliminary meta-explanation methodology that identifies the truthful parts of feature importance oriented interpretations. This approach, in addition to being used as a meta-explanation technique, can be used as an evaluation or selection tool for multiple feature importance techniques. Experimentation strongly indicates that an ensemble of multiple interpretation techniques yields considerably more truthful explanations.
    Identifying Critical LMS Features for Predicting At-risk Students. (arXiv:2204.13700v1 [cs.LG])
    Learning management systems (LMSs) have become essential in higher education and play an important role in helping educational institutions to promote student success. Traditionally, LMSs have been used by postsecondary institutions in administration, reporting, and delivery of educational content. In this paper, we present an additional use of LMS by using its data logs to perform data-analytics and identify academically at-risk students. The data-driven insights would allow educational institutions and educators to develop and implement pedagogical interventions targeting academically at-risk students. We used anonymized data logs created by Brightspace LMS during fall 2019, spring 2020, and fall 2020 semesters at our college. Supervised machine learning algorithms were used to predict the final course performance of students, and several algorithms were found to perform well with accuracy above 90%. SHAP value method was used to assess the relative importance of features used in the predictive models. Unsupervised learning was also used to group students into different clusters based on the similarities in their interaction/involvement with LMS. In both of supervised and unsupervised learning, we identified two most-important features (Number_Of_Assignment_Submissions and Content_Completed). More importantly, our study lays a foundation and provides a framework for developing a real-time data analytics metric that may be incorporated into a LMS.  ( 2 min )
    Multimodal Transformer-based Model for Buchwald-Hartwig and Suzuki-Miyaura Reaction Yield Prediction. (arXiv:2204.14062v1 [cs.LG])
    Predicting the yield percentage of a chemical reaction is useful in many aspects such as reducing wet-lab experimentation by giving the priority to the reactions with a high predicted yield. In this work we investigated the use of multiple type inputs to predict chemical reaction yield. We used simplified molecular-input line-entry system (SMILES) as well as calculated chemical descriptors as model inputs. The model consists of a pre-trained bidirectional transformer-based encoder (BERT) and a multi-layer perceptron (MLP) with a regression head to predict the yield. We experimented on two high throughput experimentation (HTE) datasets for Buchwald-Hartwig and Suzuki-Miyaura reactions. The experiments show improvements in the prediction on both datasets compared to systems using only SMILES or chemical descriptors as input. We also tested the model's performance on out-of-sample dataset splits of Buchwald-Hartwig and achieved comparable results with the state-of-the-art. In addition to predicting the yield, we demonstrated the model's ability to suggest the optimum (highest yield) reaction conditions. The model was able to suggest conditions that achieves 94% of the optimum reported yields. This proves the model to be useful in achieving the best results in the wet lab without expensive experimentation.
    Few-shot learning for medical text: A systematic review. (arXiv:2204.14081v1 [cs.CL])
    Objective: Few-shot learning (FSL) methods require small numbers of labeled instances for training. As many medical topics have limited annotated textual data in practical settings, FSL-based natural language processing (NLP) methods hold substantial promise. We aimed to conduct a systematic review to explore the state of FSL methods for medical NLP. Materials and Methods: We searched for articles published between January 2016 and August 2021 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. To identify the latest relevant methods, we also searched other sources such as preprint servers (eg., medRxiv) via Google Scholar. We included all articles that involved FSL and any type of medical text. We abstracted articles based on data source(s), aim(s), training set size(s), primary method(s)/approach(es), and evaluation method(s). Results: 31 studies met our inclusion criteria-all published after 2018; 22 (71%) since 2020. Concept extraction/named entity recognition was the most frequently addressed task (13/31; 42%), followed by text classification (10/31; 32%). Twenty-one (68%) studies reconstructed existing datasets to create few-shot scenarios synthetically, and MIMIC-III was the most frequently used dataset (7/31; 23%). Common methods included FSL with attention mechanisms (12/31; 39%), prototypical networks (8/31; 26%), and meta-learning (6/31; 19%). Discussion: Despite the potential for FSL in biomedical NLP, progress has been limited compared to domain-independent FSL. This may be due to the paucity of standardized, public datasets, and the relative underperformance of FSL methods on biomedical topics. Creation and release of specialized datasets for biomedical FSL may aid method development by enabling comparative analyses.
    MarkovGNN: Graph Neural Networks on Markov Diffusion. (arXiv:2202.02470v2 [cs.LG] UPDATED)
    Most real-world networks contain well-defined community structures where nodes are densely connected internally within communities. To learn from these networks, we develop MarkovGNN that captures the formation and evolution of communities directly in different convolutional layers. Unlike most Graph Neural Networks (GNNs) that consider a static graph at every layer, MarkovGNN generates different stochastic matrices using a Markov process and then uses these community-capturing matrices in different layers. MarkovGNN is a general approach that could be used with most existing GNNs. We experimentally show that MarkovGNN outperforms other GNNs for clustering, node classification, and visualization tasks. The source code of MarkovGNN is publicly available at \url{https://github.com/HipGraph/MarkovGNN}.
    GCN-FFNN: A Two-Stream Deep Model for Learning Solution to Partial Differential Equations. (arXiv:2204.13744v1 [cs.LG])
    This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and processes its own input representation. As opposed to FFNN which receives a grid-like structure, the GCN stream layer operates on graph input data where the neighborhood information is incorporated through the adjacency matrix of the graph. In this way, the proposed GCN-FFNN model learns from two types of input representations, i.e. grid and graph data, obtained via the discretization of the PDE domain. The GCN-FFNN model is trained in two phases. In the first phase, the model parameters of each stream are trained separately. Both streams employ the same error function to adjust their parameters by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph collocation (training) data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously used error function. The learned GCN-FFNN model is tested on test data located both inside and outside the PDE domain. The obtained numerical results demonstrate the applicability and efficiency of the proposed GCN-FFNN model over individual GCN and FFNN models on 1D-Burgers, 1D-Schr\"odinger, 2D-Burgers and 2D-Schr\"odinger equations.
    ROS-X-Habitat: Bridging the ROS Ecosystem with Embodied AI. (arXiv:2109.07703v3 [cs.RO] UPDATED)
    We introduce ROS-X-Habitat, a software interface that bridges the AI Habitat platform for embodied learning-based agents with other robotics resources via ROS. This interface not only offers standardized communication protocols between embodied agents and simulators, but also enables physically and photorealistic simulation that benefits the training and/or testing of vision-based embodied agents. With this interface, roboticists can evaluate their own Habitat RL agents in another ROS-based simulator or use Habitat Sim v2 as the test bed for their own robotic algorithms. Through in silico experiments, we demonstrate that ROS-X-Habitat has minimal impact on the navigation performance and simulation speed of a Habitat RGBD agent; that a standard set of ROS mapping, planning and navigation tools can run in Habitat Sim v2; and that a Habitat agent can run in the standard ROS simulator Gazebo.
    A study of tree-based methods and their combination. (arXiv:2204.13916v1 [stat.ML])
    Tree-based methods are popular machine learning techniques used in various fields. In this work, we review their foundations and a general framework the importance sampled learning ensemble (ISLE) that accelerates their fitting process. Furthermore, we describe a model combination strategy called the adaptive regression by mixing (ARM), which is feasible for tree- based methods via ISLE. Moreover, three modified ISLEs are proposed, and their performance are evaluated on the real data sets.
    Randomized Smoothing under Attack: How Good is it in Pratice?. (arXiv:2204.14187v1 [cs.CR])
    Randomized smoothing is a recent and celebrated solution to certify the robustness of any classifier. While it indeed provides a theoretical robustness against adversarial attacks, the dimensionality of current classifiers necessarily imposes Monte Carlo approaches for its application in practice. This paper questions the effectiveness of randomized smoothing as a defense, against state of the art black-box attacks. This is a novel perspective, as previous research works considered the certification as an unquestionable guarantee. We first formally highlight the mismatch between a theoretical certification and the practice of attacks on classifiers. We then perform attacks on randomized smoothing as a defense. Our main observation is that there is a major mismatch in the settings of the RS for obtaining high certified robustness or when defeating black box attacks while preserving the classifier accuracy.
    Cost Effective MLaaS Federation: A Combinatorial Reinforcement Learning Approach. (arXiv:2204.13971v1 [cs.LG])
    With the advancement of deep learning techniques, major cloud providers and niche machine learning service providers start to offer their cloud-based machine learning tools, also known as machine learning as a service (MLaaS), to the public. According to our measurement, for the same task, these MLaaSes from different providers have varying performance due to the proprietary datasets, models, etc. Federating different MLaaSes together allows us to improve the analytic performance further. However, naively aggregating results from different MLaaSes not only incurs significant momentary cost but also may lead to sub-optimal performance gain due to the introduction of possible false-positive results. In this paper, we propose Armol, a framework to federate the right selection of MLaaS providers to achieve the best possible analytic performance. We first design a word grouping algorithm to unify the output labels across different providers. We then present a deep combinatorial reinforcement learning based-approach to maximize the accuracy while minimizing the cost. The predictions from the selected providers are then aggregated together using carefully chosen ensemble strategies. The real-world trace-driven evaluation further demonstrates that Armol is able to achieve the same accuracy results with $67\%$ less inference cost.
    Topological Data Analysis in Time Series: Temporal Filtration and Application to Single-Cell Genomics. (arXiv:2204.14048v1 [cs.LG])
    The absence of a conventional association between the cell-cell cohabitation and its emergent dynamics into cliques during development has hindered our understanding of how cell populations proliferate, differentiate, and compete, i.e. the cell ecology. With the recent advancement of the single-cell RNA-sequencing (RNA-seq), we can potentially describe such a link by constructing network graphs that characterize the similarity of the gene expression profiles of the cell-specific transcriptional programs, and analyzing these graphs systematically using the summary statistics informed by the algebraic topology. We propose the single-cell topological simplicial analysis (scTSA). Applying this approach to the single-cell gene expression profiles from local networks of cells in different developmental stages with different outcomes reveals a previously unseen topology of cellular ecology. These networks contain an abundance of cliques of single-cell profiles bound into cavities that guide the emergence of more complicated habitation forms. We visualize these ecological patterns with topological simplicial architectures of these networks, compared with the null models. Benchmarked on the single-cell RNA-seq data of zebrafish embryogenesis spanning 38,731 cells, 25 cell types and 12 time steps, our approach highlights the gastrulation as the most critical stage, consistent with consensus in developmental biology. As a nonlinear, model-independent, and unsupervised framework, our approach can also be applied to tracing multi-scale cell lineage, identifying critical stages, or creating pseudo-time series.
    No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment. (arXiv:2204.14006v1 [cs.CY])
    Student assessment is one of the most fundamental tasks in the field of AI Education (AIEd). One of the most common approach to student assessment is Knowledge Tracing (KT), which evaluates a student's knowledge state by predicting whether the student will answer a given question correctly or not. However, in the context of multiple choice (polytomous) questions, conventional KT approaches are limited in that they only consider the binary (dichotomous) correctness label (i.e., correct or incorrect), and disregard the specific option chosen by the student. Meanwhile, Option Tracing (OT) attempts to model a student by predicting which option they will choose for a given question, but overlooks the correctness information. In this paper, we propose Dichotomous-Polytomous Multi-Task Learning (DP-MTL), a multi-task learning framework that combines KT and OT for more precise student assessment. In particular, we show that the KT objective acts as a regularization term for OT in the DP-MTL framework, and propose an appropriate architecture for applying our method on top of existing deep learning-based KT models. We experimentally confirm that DP-MTL significantly improves both KT and OT performances, and also benefits downstream tasks such as Score Prediction (SP).
    RoSA: A Robust Self-Aligned Framework for Node-Node Graph Contrastive Learning. (arXiv:2204.13846v1 [cs.LG])
    Graph contrastive learning has gained significant progress recently. However, existing works have rarely explored non-aligned node-node contrasting. In this paper, we propose a novel graph contrastive learning method named RoSA that focuses on utilizing non-aligned augmented views for node-level representation learning. First, we leverage the earth mover's distance to model the minimum effort to transform the distribution of one view to the other as our contrastive objective, which does not require alignment between views. Then we introduce adversarial training as an auxiliary method to increase sampling diversity and enhance the robustness of our model. Experimental results show that RoSA outperforms a series of graph contrastive learning frameworks on homophilous, non-homophilous and dynamic graphs, which validates the effectiveness of our work. To the best of our awareness, RoSA is the first work focuses on the non-aligned node-node graph contrastive learning problem. Our codes are available at: \href{https://github.com/ZhuYun97/RoSA}{\texttt{https://github.com/ZhuYun97/RoSA}}
    Formulating Robustness Against Unforeseen Attacks. (arXiv:2204.13779v1 [cs.LG])
    Existing defenses against adversarial examples such as adversarial training typically assume that the adversary will conform to a specific or known threat model, such as $\ell_p$ perturbations within a fixed budget. In this paper, we focus on the scenario where there is a mismatch in the threat model assumed by the defense during training, and the actual capabilities of the adversary at test time. We ask the question: if the learner trains against a specific "source" threat model, when can we expect robustness to generalize to a stronger unknown "target" threat model during test-time? Our key contribution is to formally define the problem of learning and generalization with an unforeseen adversary, which helps us reason about the increase in adversarial risk from the conventional perspective of a known adversary. Applying our framework, we derive a generalization bound which relates the generalization gap between source and target threat models to variation of the feature extractor, which measures the expected maximum difference between extracted features across a given threat model. Based on our generalization bound, we propose adversarial training with variation regularization (AT-VR) which reduces variation of the feature extractor across the source threat model during training. We empirically demonstrate that AT-VR can lead to improved generalization to unforeseen attacks during test-time compared to standard adversarial training on Gaussian and image datasets.
    Making sense of violence risk predictions using clinical notes. (arXiv:2204.13976v1 [cs.LG])
    Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records (EHR) are valuable resources that are seldom used to their full potential. Previous studies have attempted to assess violence risk in psychiatric patients using such notes, with acceptable performance. However, they do not explain why classification works and how it can be improved. We explore two methods to better understand the quality of a classifier in the context of clinical note analysis: random forests using topic models, and choice of evaluation metric. These methods allow us to understand both our data and our methodology more profoundly, setting up the groundwork to work on improved models that build upon this understanding. This is particularly important when it comes to the generalizability of evaluated classifiers to new data, a trustworthiness problem that is of great interest due to the increased availability of new data in electronic format.
    Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling. (arXiv:2204.14017v1 [cs.LG])
    Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due to the potential risks of adversaries participating in the framework to poison the global model for an adversarial purpose. This paper investigates the feasibility of model poisoning for backdoor attacks through \textit{rare word embeddings of NLP models} in text classification and sequence-to-sequence tasks. In text classification, less than 1\% of adversary clients suffices to manipulate the model output without any drop in the performance of clean sentences. For a less complex dataset, a mere 0.1\% of adversary clients is enough to poison the global model effectively. We also propose a technique specialized in the federated learning scheme called gradient ensemble, which enhances the backdoor performance in all experimental settings.
    Explainable AI via Learning to Optimize. (arXiv:2204.14174v1 [math.OC])
    Indecipherable black boxes are common in machine learning (ML), but applications increasingly require explainable artificial intelligence (XAI). The core of XAI is to establish transparent and interpretable data-driven algorithms. This work provides concrete tools for XAI in situations where prior knowledge must be encoded and untrustworthy inferences flagged. We use the "learn to optimize" (L2O) methodology wherein each inference solves a data-driven optimization problem. Our L2O models are straightforward to implement, directly encode prior knowledge, and yield theoretical guarantees (e.g. satisfaction of constraints). We also propose use of interpretable certificates to verify whether model inferences are trustworthy. Numerical examples are provided in the applications of dictionary-based signal recovery, CT imaging, and arbitrage trading of cryptoassets.
    Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN. (arXiv:2204.14079v1 [cs.CV])
    Transfer learning of StyleGAN has recently shown great potential to solve diverse tasks, especially in domain translation. Previous methods utilized a source model by swapping or freezing weights during transfer learning, however, they have limitations on visual quality and controlling source features. In other words, they require additional models that are computationally demanding and have restricted control steps that prevent a smooth transition. In this paper, we propose a new approach to overcome these limitations. Instead of swapping or freezing, we introduce a simple feature matching loss to improve generation quality. In addition, to control the degree of source features, we train a target model with the proposed strategy, FixNoise, to preserve the source features only in a disentangled subspace of a target feature space. Owing to the disentangled feature space, our method can smoothly control the degree of the source features in a single model. Extensive experiments demonstrate that the proposed method can generate more consistent and realistic images than previous works.
    Federated Learning: Balancing the Thin Line Between Data Intelligence and Privacy. (arXiv:2204.13697v1 [cs.LG])
    Federated learning holds great promise in learning from fragmented sensitive data and has revolutionized how machine learning models are trained. This article provides a systematic overview and detailed taxonomy of federated learning. We investigate the existing security challenges in federated learning and provide a comprehensive overview of established defense techniques for data poisoning, inference attacks, and model poisoning attacks. The work also presents an overview of current training challenges for federated learning, focusing on handling non-i.i.d. data, high dimensionality issues, and heterogeneous architecture, and discusses several solutions for the associated challenges. Finally, we discuss the remaining challenges in managing federated learning training and suggest focused research directions to address the open questions. Potential candidate areas for federated learning, including IoT ecosystem, healthcare applications, are discussed with a particular focus on banking and financial domains.
    CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction. (arXiv:2204.13847v1 [cs.LG])
    Medical event prediction (MEP) is a fundamental task in the medical domain, which needs to predict medical events, including medications, diagnosis codes, laboratory tests, procedures, outcomes, and so on, according to historical medical records. The task is challenging as medical data is a type of complex time series data with heterogeneous and temporal irregular characteristics. Many machine learning methods that consider the two characteristics have been proposed for medical event prediction. However, most of them consider the two characteristics separately and ignore the correlations among different types of medical events, especially relations between historical medical events and target medical events. In this paper, we propose a novel neural network based on attention mechanism, called cross-event attention-based time-aware network (CATNet), for medical event prediction. It is a time-aware, event-aware and task-adaptive method with the following advantages: 1) modeling heterogeneous information and temporal information in a unified way and considering temporal irregular characteristics locally and globally respectively, 2) taking full advantage of correlations among different types of events via cross-event attention. Experiments on two public datasets (MIMIC-III and eICU) show CATNet can be adaptive with different MEP tasks and outperforms other state-of-the-art methods on various MEP tasks. The source code of CATNet will be released after this manuscript is accepted.
    VPNets: Volume-preserving neural networks for learning source-free dynamics. (arXiv:2204.13843v1 [cs.LG])
    We propose volume-preserving networks (VPNets) for learning unknown source-free dynamical systems using trajectory data. We propose three modules and combine them to obtain two network architectures, coined R-VPNet and LA-VPNet. The distinct feature of the proposed models is that they are intrinsic volume-preserving. In addition, the corresponding approximation theorems are proved, which theoretically guarantee the expressivity of the proposed VPNets to learn source-free dynamics. The effectiveness, generalization ability and structure-preserving property of the VP-Nets are demonstrated by numerical experiments.
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v1 [cs.LG])
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.
    Learning from Natural Language Feedback. (arXiv:2204.14146v1 [cs.CL])
    Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
    High Dimensional Bayesian Optimization with Kernel Principal Component Analysis. (arXiv:2204.13753v1 [cs.LG])
    Bayesian Optimization (BO) is a surrogate-based global optimization strategy that relies on a Gaussian Process regression (GPR) model to approximate the objective function and an acquisition function to suggest candidate points. It is well-known that BO does not scale well for high-dimensional problems because the GPR model requires substantially more data points to achieve sufficient accuracy and acquisition optimization becomes computationally expensive in high dimensions. Several recent works aim at addressing these issues, e.g., methods that implement online variable selection or conduct the search on a lower-dimensional sub-manifold of the original search space. Advancing our previous work of PCA-BO that learns a linear sub-manifold, this paper proposes a novel kernel PCA-assisted BO (KPCA-BO) algorithm, which embeds a non-linear sub-manifold in the search space and performs BO on this sub-manifold. Intuitively, constructing the GPR model on a lower-dimensional sub-manifold helps improve the modeling accuracy without requiring much more data from the objective function. Also, our approach defines the acquisition function on the lower-dimensional sub-manifold, making the acquisition optimization more manageable. We compare the performance of KPCA-BO to the vanilla BO and PCA-BO on the multi-modal problems of the COCO/BBOB benchmark suite. Empirical results show that KPCA-BO outperforms BO in terms of convergence speed on most test problems, and this benefit becomes more significant when the dimensionality increases. For the 60D functions, KPCA-BO surpasses PCA-BO in many test cases. Moreover, it efficiently reduces the CPU time required to train the GPR model and optimize the acquisition function compared to the vanilla BO.
    DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification. (arXiv:2112.14299v2 [cs.LG] UPDATED)
    Data processing and analysis pipelines in cosmological survey experiments introduce data perturbations that can significantly degrade the performance of deep learning-based models. Given the increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.
    An Extensive Data Processing Pipeline for MIMIC-IV. (arXiv:2204.13841v1 [cs.LG])
    An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical tasks. This growing area of research has exposed the limitation of accessibility of EHR datasets for all, as well as the reproducibility of different modeling frameworks. One reason for these limitations is the lack of standardized pre-processing pipelines. MIMIC is a freely available EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps serves as a major barrier to the wider adoption of the dataset. It also leads to different cohorts being used in downstream tasks, limiting the ability to compare the results among similar studies. Contrasting studies also use various distinct performance metrics, which can greatly reduce the ability to compare model results. In this work, we provide an end-to-end fully customizable pipeline to extract, clean, and pre-process data; and to predict and evaluate the fourth version of the MIMIC dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction tasks.
    Particle Swarm Optimization Based Demand Response Using Artificial Neural Network Based Load Prediction. (arXiv:2204.13990v1 [cs.NE])
    In the present study, a Particle Swarm Optimization (PSO) based Demand Response (DR) model, using Artificial Neural Network (ANN) to predict load is proposed. The electrical load and climatological data of a residential area in Austin city in Texas are used as the inputs of the ANN. Then, the outcomes with the day-ahead prices data are used to solve the load shifting and cost reduction problem. According to the results, the proposed model has the ability to decrease payment costs and peak load.
    Multi-Agent MDP Homomorphic Networks. (arXiv:2110.04495v2 [cs.LG] UPDATED)
    This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a permutation of the optimal joint policy. Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting, because such approaches rely on the global symmetry in the full state-action spaces, and these can result in correspondences across agents. To encode such symmetries while still allowing distributed execution we propose a factorization that decomposes global symmetries into local transformations. Our proposed factorization allows for distributing the computation that enforces global symmetries over local agents and local interactions. We introduce a multi-agent equivariant policy network based on this factorization. We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines.
    Tailored Uncertainty Estimation for Deep Learning Systems. (arXiv:2204.13963v1 [cs.LG])
    Uncertainty estimation bears the potential to make deep learning (DL) systems more reliable. Standard techniques for uncertainty estimation, however, come along with specific combinations of strengths and weaknesses, e.g., with respect to estimation quality, generalization abilities and computational complexity. To actually harness the potential of uncertainty quantification, estimators are required whose properties closely match the requirements of a given use case. In this work, we propose a framework that, firstly, structures and shapes these requirements, secondly, guides the selection of a suitable uncertainty estimation method and, thirdly, provides strategies to validate this choice and to uncover structural weaknesses. By contributing tailored uncertainty estimation in this sense, our framework helps to foster trustworthy DL systems. Moreover, it anticipates prospective machine learning regulations that require, e.g., in the EU, evidences for the technical appropriateness of machine learning systems. Our framework provides such evidences for system components modeling uncertainty.
    Hyperbolic Hierarchical Knowledge Graph Embeddings for Link Prediction in Low Dimensions. (arXiv:2204.13704v1 [cs.LG])
    Knowledge graph embeddings (KGE) have been validated as powerful methods for inferring missing links in knowledge graphs (KGs) since they map entities into Euclidean space and treat relations as transformations of entities. Currently, some Euclidean KGE methods model semantic hierarchies prevalent in KGs and promote the performance of link prediction. For hierarchical data, instead of traditional Euclidean space, hyperbolic space as an embedding space has shown the promise of high fidelity and low memory consumption; however, existing hyperbolic KGE methods neglect to model them. To address this issue, we propose a novel KGE model -- hyperbolic hierarchical KGE (HypHKGE). To be specific, we first design the attention-based learnable curvatures for hyperbolic space to preserve rich semantic hierarchies. Moreover, we define the hyperbolic hierarchical transformations based on the theory of hyperbolic geometry, which utilize hierarchies that we preserved to infer the links. Experiments show that HypHKGE can effectively model semantic hierarchies in hyperbolic space and outperforms the state-of-the-art hyperbolic methods, especially in low dimensions.
    An Online Ensemble Learning Model for Detecting Attacks in Wireless Sensor Networks. (arXiv:2204.13814v1 [cs.NI])
    In today's modern world, the usage of technology is unavoidable and the rapid advances in the Internet and communication fields have resulted to expand the Wireless Sensor Network (WSN) technology. A huge number of sensing devices collect and/or generate numerous sensory data throughout time for a wide range of fields and applications. However, WSN has been proven to be vulnerable to security breaches, the harsh and unattended deployment of these networks, combined with their constrained resources and the volume of data generated introduce a major security concern. WSN applications are extremely critical, it is essential to build reliable solutions that involve fast and continuous mechanisms for online data stream analysis enabling the detection of attacks and intrusions. In this context, our aim is to develop an intelligent, efficient, and updatable intrusion detection system by applying an important machine learning concept known as ensemble learning in order to improve detection performance. Although ensemble models have been proven to be useful in offline learning, they have received less attention in streaming applications. In this paper, we examine the application of different homogeneous and heterogeneous online ensembles in sensory data analysis, on a specialized wireless sensor network-detection system (WSN-DS) dataset in order to classify four types of attacks: Blackhole attack, Grayhole, Flooding, and Scheduling among normal network traffic. Among the proposed novel online ensembles, both the heterogeneous ensemble consisting of an Adaptive Random Forest (ARF) combined with the Hoeffding Adaptive Tree (HAT) algorithm and the homogeneous ensemble HAT made up of 10 models achieved higher detection rates of 96.84% and 97.2%, respectively. The above models are efficient and effective in dealing with concept drift, while taking into account the resource constraints of WSNs.
    Bayesian Information Criterion for Event-based Multi-trial Ensemble data. (arXiv:2204.14096v1 [stat.ML])
    Transient recurring phenomena are ubiquitous in many scientific fields like neuroscience and meteorology. Time inhomogenous Vector Autoregressive Models (VAR) may be used to characterize peri-event system dynamics associated with such phenomena, and can be learned by exploiting multi-dimensional data gathering samples of the evolution of the system in multiple time windows comprising, each associated with one occurrence of the transient phenomenon, that we will call "trial". However, optimal VAR model order selection methods, commonly relying on the Akaike or Bayesian Information Criteria (AIC/BIC), are typically not designed for multi-trial data. Here we derive the BIC methods for multi-trial ensemble data which are gathered after the detection of the events. We show using simulated bivariate AR models that the multi-trial BIC is able to recover the real model order. We also demonstrate with simulated transient events and real data that the multi-trial BIC is able to estimate a sufficiently small model order for dynamic system modeling.
    Local Explanation of Dimensionality Reduction. (arXiv:2204.14012v1 [cs.LG])
    Dimensionality reduction (DR) is a popular method for preparing and analyzing high-dimensional data. Reduced data representations are less computationally intensive and easier to manage and visualize, while retaining a significant percentage of their original information. Aside from these advantages, these reduced representations can be difficult or impossible to interpret in most circumstances, especially when the DR approach does not provide further information about which features of the original space led to their construction. This problem is addressed by Interpretable Machine Learning, a subfield of Explainable Artificial Intelligence that addresses the opacity of machine learning models. However, current research on Interpretable Machine Learning has been focused on supervised tasks, leaving unsupervised tasks like Dimensionality Reduction unexplored. In this paper, we introduce LXDR, a technique capable of providing local interpretations of the output of DR techniques. Experiment results and two LXDR use case examples are presented to evaluate its usefulness.
    SwiftAgg: Communication-Efficient and Dropout-Resistant Secure Aggregation for Federated Learning with Worst-Case Security Guarantees. (arXiv:2202.04169v2 [cs.IT] UPDATED)
    We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Specifically, in presence of at most $D$ dropout users, SwiftAgg achieves a users-to-server communication load of $(T+1)L$ and a users-to-users communication load of up to $(N-1)(T+D+1)L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The key idea of SwiftAgg is to partition the users into groups of size $D+T+1$, then in the first phase, secret sharing and aggregation of the individual models are performed within each group, and then in the second phase, model aggregation is performed on $D+T+1$ sequences of users across the groups. If a user in a sequence drops out in the second phase, the rest of the sequence remain silent. This design allows only a subset of users to communicate with each other, and only the users in a single group to directly communicate with the server, eliminating the requirements of 1) all-to-all communication network across users; and 2) all users communicating with the server, for other secure aggregation protocols. This helps to substantially slash the communication costs of the system.
    Framework for Behavioral Disorder Detection Using Machine Learning and Application of Virtual Cognitive Behavioral Therapy in COVID-19 Pandemic. (arXiv:2204.13900v1 [cs.LG])
    In this modern world, people are becoming more self-centered and unsocial. On the other hand, people are stressed, becoming more anxious during COVID-19 pandemic situation and exhibits symptoms of behavioral disorder. To measure the symptoms of behavioral disorder, usually psychiatrist use long hour sessions and inputs from specific questionnaire. This process is time consuming and sometime is ineffective to detect the right behavioral disorder. Also, reserved people sometime hesitate to follow this process. We have created a digital framework which can detect behavioral disorder and prescribe virtual Cognitive Behavioral Therapy (vCBT) for recovery. By using this framework people can input required data that are highly responsible for the three behavioral disorders namely depression, anxiety and internet addiction. We have applied machine learning technique to detect specific behavioral disorder from samples. This system guides the user with basic understanding and treatment through vCBT from anywhere any time which would potentially be the steppingstone for the user to be conscious and pursue right treatment.
    Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting--Full Version. (arXiv:2204.13767v1 [cs.LG])
    A variety of real-world applications rely on far future information to make decisions, thus calling for efficient and accurate long sequence multivariate time series forecasting. While recent attention-based forecasting models show strong abilities in capturing long-term dependencies, they still suffer from two key limitations. First, canonical self attention has a quadratic complexity w.r.t. the input time series length, thus falling short in efficiency. Second, different variables' time series often have distinct temporal dynamics, which existing studies fail to capture, as they use the same model parameter space, e.g., projection matrices, for all variables' time series, thus falling short in accuracy. To ensure high efficiency and accuracy, we propose Triformer, a triangular, variable-specific attention. (i) Linear complexity: we introduce a novel patch attention with linear complexity. When stacking multiple layers of the patch attentions, a triangular structure is proposed such that the layer sizes shrink exponentially, thus maintaining linear complexity. (ii) Variable-specific parameters: we propose a light-weight method to enable distinct sets of model parameters for different variables' time series to enhance accuracy without compromising efficiency and memory usage. Strong empirical evidence on four datasets from multiple domains justifies our design choices, and it demonstrates that Triformer outperforms state-of-the-art methods w.r.t. both accuracy and efficiency. This is an extended version of "Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting", to appear in IJCAI 2022 [Cirstea et al., 2022a], including additional experimental results.
    Fairer LP-based Online Allocation via Analytic Center. (arXiv:2110.14621v3 [cs.DS] UPDATED)
    In this paper, we consider an online resource allocation problem where a decision maker accepts or rejects incoming customer requests irrevocably in order to maximize expected reward given limited resources. At each time, a new order/customer/bid is revealed with a request of some resource(s) and a reward. We consider a stochastic setting where all the orders are i.i.d. sampled from an unknown distribution. Such formulation arises from many classic applications such as the canonical (quantity-based) network revenue management problem and the Adwords problem. While the literature on the topic mainly focuses on regret minimization, our paper considers the \textit{fairness} aspect of the problem. On a high level, we define the fairness in a way that a fair online algorithm should treat similar agents/customers similarly, and the decision made for similar agents/customers should be consistent over time. To achieve this goal, we define the fair offline solution as the analytic center of the offline optimal solution set, and introduce \textit{cumulative unfairness} as the cumulative deviation from the online solutions to the fair offline solution over time. We propose a fair algorithm based on an interior-point LP solver and a mechanism that dynamically detects unfair resource spending. Our algorithm achieves cumulative unfairness on the scale of order $O(\log(T))$, while maintains the regret to be bounded without dependency on $T$. In addition, compared to the literature, our result is produced under less restrictive assumptions on the degeneracy of the underlying linear program.
    Convergence of gradient descent for deep neural networks. (arXiv:2203.16462v2 [cs.LG] UPDATED)
    Optimization by gradient descent has been one of main drivers of the "deep learning revolution". Yet, despite some recent progress for extremely wide networks, it remains an open problem to understand why gradient descent often converges to global minima when training deep neural networks. This article presents a new criterion for convergence of gradient descent to a global minimum, which is provably more powerful than the best available criteria from the literature, namely, the Lojasiewicz inequality and its generalizations. This criterion is used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points.
    Machine Learning-Based GPS Multipath Detection Method Using Dual Antennas. (arXiv:2204.14001v1 [cs.NI])
    In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS measurements selected as suggested features. We applied five features for machine learning, including a feature obtained from the dual antennas, and evaluated the classification performance of the model, after applying four machine learning algorithms: gradient boosting decision tree (GBDT), random forest, decision tree, and K-nearest neighbor (KNN). It was found that a classification accuracy of 82%-96% was achieved when the test data set was collected at the same locations as those of the training data set. However, when the test data set was collected at locations different from those of the training data, a classification accuracy of 44%-77% was obtained.
    Depth Estimation with Simplified Transformer. (arXiv:2204.13791v1 [cs.CV])
    Transformer and its variants have shown state-of-the-art results in many vision tasks recently, ranging from image classification to dense prediction. Despite of their success, limited work has been reported on improving the model efficiency for deployment in latency-critical applications, such as autonomous driving and robotic navigation. In this paper, we aim at improving upon the existing transformers in vision, and propose a method for self-supervised monocular Depth Estimation with Simplified Transformer (DEST), which is efficient and particularly suitable for deployment on GPU-based platforms. Through strategic design choices, our model leads to significant reduction in model size, complexity, as well as inference latency, while achieving superior accuracy as compared to state-of-the-art. We also show that our design generalize well to other dense prediction task without bells and whistles.
    Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks. (arXiv:2111.02278v2 [cs.LG] UPDATED)
    Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the "knot" points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.
    Stochastic Video Prediction with Structure and Motion. (arXiv:2203.10528v2 [cs.CV] UPDATED)
    While stochastic video prediction models enable future prediction under uncertainty, they mostly fail to model the complex dynamics of real-world scenes. For example, they cannot provide reliable predictions for scenes with a moving camera and independently moving foreground objects in driving scenarios. The existing methods fail to fully capture the dynamics of the structured world by only focusing on changes in pixels. In this paper, we assume that there is an underlying process creating observations in a video and propose to factorize it into static and dynamic components. We model the static part based on the scene structure and the ego-motion of the vehicle, and the dynamic part based on the remaining motion of the dynamic objects. By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts and separately model the change in each. Our experiments demonstrate that disentangling structure and motion helps stochastic video prediction, leading to better future predictions in complex driving scenarios on two real-world driving datasets, KITTI and Cityscapes.
    Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification. (arXiv:2203.13551v2 [cs.LG] UPDATED)
    Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.
    Statistical applications of contrastive learning. (arXiv:2204.13999v1 [cs.LG])
    The likelihood function plays a crucial role in statistical inference and experimental design. However, it is computationally intractable for several important classes of statistical models, including energy-based models and simulator-based models. Contrastive learning is an intuitive and computationally feasible alternative to likelihood-based learning. We here first provide an introduction to contrastive learning and then show how we can use it to derive methods for diverse statistical problems, namely parameter estimation for energy-based models, Bayesian inference for simulator-based models, as well as experimental design.
    Dynamic Diagnosis of the Progress and Shortcomings of Student Learning using Machine Learning based on Cognitive, Social, and Emotional Features. (arXiv:2204.13989v1 [cs.CY])
    Student diversity, like academic background, learning styles, career and life goals, ethnicity, age, social and emotional characteristics, course load and work schedule, offers unique opportunities in education, like learning new skills, peer mentoring and example setting. But student diversity can be challenging too as it adds variability in the way in which students learn and progress over time. A single teaching approach is likely to be ineffective and result in students not meeting their potential. Automated support could address limitations of traditional teaching by continuously assessing student learning and implementing needed interventions. This paper discusses a novel methodology based on data analytics and Machine Learning to measure and causally diagnose the progress and shortcomings of student learning, and then utilizes the insight gained on individuals to optimize learning. Diagnosis pertains to dynamic diagnostic formative assessment, which aims to uncover the causes of learning shortcomings. The methodology groups learning difficulties into four categories: recall from memory, concept adjustment, concept modification, and problem decomposition into sub-goals (sub-problems) and concept combination. Data models are predicting the occurrence of each of the four challenge types, as well as a student's learning trajectory. The models can be used to automatically create real-time, student-specific interventions (e.g., learning cues) to address less understood concepts. We envision that the system will enable new adaptive pedagogical approaches to unleash student learning potential through customization of the course material to the background, abilities, situation, and progress of each student; and leveraging diversity-related learning experiences.
    Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking. (arXiv:2204.12263v2 [cs.CL] UPDATED)
    With the explosive growth of scientific publications, making the synthesis of scientific knowledge and fact checking becomes an increasingly complex task. In this paper, we propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles. We propose an intelligent combination of (1) an automatic information summarization and (2) a Boolean Question Answering which allows to generate an answer to a scientific question from only extracts obtained after summarization. Thus on a given topic, our proposed approach conducts structured content modeling based on paper abstracts to answer a scientific question while highlighting texts from paper that discuss the topic. We based our final system on an end-to-end Extractive Question Answering (EQA) combined with a three outputs classification model to perform in-depth semantic understanding of a question to illustrate the aggregation of multiple responses. With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%. Our results are supported via experiments with two QA models (BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and health domains on Europe PMC.
    Toward Degradation-Robust Voice Conversion. (arXiv:2110.07537v3 [eess.AS] UPDATED)
    Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training. Although there have been several state-of-the-art any-to-any voice conversion models, they were all based on clean utterances to convert successfully. However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations. It thus becomes highly desired to understand how these degradations affect voice conversion and build a degradation-robust model. We report in this paper the first comprehensive study on the degradation robustness of any-to-any voice conversion. We show that the performance of state-of-the-art models nowadays was severely hampered given degraded utterances. To this end, we then propose speech enhancement concatenation and denoising training to improve the robustness. In addition to common degradations, we also consider adversarial noises, which alter the model output significantly yet are human-imperceptible. It was shown that both concatenations with off-the-shelf speech enhancement models and denoising training on voice conversion models could improve the robustness, while each of them had pros and cons.  ( 2 min )
    One-Way Matching of Datasets with Low Rank Signals. (arXiv:2204.13858v1 [math.ST])
    We study one-way matching of a pair of datasets with low rank signals. Under a stylized model, we first derive information-theoretic limits of matching. We then show that linear assignment with projected data achieves fast rates of convergence and sometimes even minimax rate optimality for this task. The theoretical error bounds are corroborated by simulated examples. Furthermore, we illustrate practical use of the matching procedure on two single-cell data examples.
    Post-hoc Interpretability for Neural NLP: A Survey. (arXiv:2108.04840v4 [cs.CL] UPDATED)
    Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.
    AGIC: Approximate Gradient Inversion Attack on Federated Learning. (arXiv:2204.13784v1 [cs.LG])
    Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models. Recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single minibatch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are overlooked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, CIFAR-10, CIFAR-100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates.
    Modular Domain Adaptation. (arXiv:2204.14213v1 [cs.CL])
    Off-the-shelf models are widely used by computational social science researchers to measure properties of text, such as sentiment. However, without access to source data it is difficult to account for domain shift, which represents a threat to validity. Here, we treat domain adaptation as a modular process that involves separate model producers and model consumers, and show how they can independently cooperate to facilitate more accurate measurements of text. We introduce two lightweight techniques for this scenario, and demonstrate that they reliably increase out-of-domain accuracy on four multi-domain text classification datasets when used with linear and contextual embedding models. We conclude with recommendations for model producers and consumers, and release models and replication code to accompany this paper.  ( 2 min )
    Physical Deep Learning with Biologically Plausible Training Method. (arXiv:2204.13991v1 [cs.NE])
    The ever-growing demand for further advances in artificial intelligence motivated research on unconventional computation based on analog physical devices. While such computation devices mimic brain-inspired analog information processing, learning procedures still relies on methods optimized for digital processing such as backpropagation. Here, we present physical deep learning by extending a biologically plausible training algorithm called direct feedback alignment. As the proposed method is based on random projection with arbitrary nonlinear activation, we can train a physical neural network without knowledge about the physical system. In addition, we can emulate and accelerate the computation for this training on a simple and scalable physical system. We demonstrate the proof-of-concept using a hierarchically connected optoelectronic recurrent neural network called deep reservoir computer. By constructing an FPGA-assisted optoelectronic benchtop, we confirmed the potential for accelerated computation with competitive performance on benchmarks. Our results provide practical solutions for the training and acceleration of neuromorphic computation.  ( 2 min )
    Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection. (arXiv:2204.13702v1 [cs.LG])
    This paper presents an optimized logistic regression machine learning model that predicts the occupancy of an Electric Vehicle (EV) charging station given the occupancy of neighboring stations. The model was optimized for the time of day. Trained on data from 57 EV charging stations around the University of California San Diego campus, the model achieved an 88.43% average accuracy and 92.23% maximum accuracy in predicting occupancy, outperforming a persistence model benchmark.  ( 2 min )
    HyperJump: Accelerating HyperBand via Risk Modelling. (arXiv:2108.02479v3 [cs.LG] UPDATED)
    In the literature on hyper-parameter tuning, a number of recent solutions rely on low-fidelity observations (e.g., training with sub-sampled datasets or for short periods of time) to extrapolate good configurations to use when performing full training. Among these, HyperBand is arguably one of the most popular solutions, due to its efficiency and theoretically provable robustness. In this work, we introduce HyperJump, a new approach that builds on HyperBand's robust search strategy and complements it with novel model-based risk analysis techniques that accelerate the search by \textit{jumping} the evaluation of low risk configurations, i.e., configurations that are likely to be discarded by HyperBand. We evaluate HyperJump on a suite of hyper-parameter optimization problems and show that it provides over one-order of magnitude speed-ups, both in sequential and parallel deployments, on a variety of deep learning, kernel-based learning, and neural architectural search problems when compared to HyperBand and to several state-of-the-art optimizers.  ( 2 min )
    A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models. (arXiv:2204.14061v1 [cs.LG])
    The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of machine learning models - a so far underexplored application of Quality Diversity Optimization. Our benchmark problems involve novel feature functions, such as interpretability or resource usage of models. To allow for fast and efficient benchmarking, we build upon YAHPO Gym, a recently proposed open source benchmarking suite for hyperparameter optimization that makes use of high performing surrogate models and returns these surrogate model predictions instead of evaluating the true expensive black box function. We present results of an initial experimental study comparing different Quality Diversity optimizers on our benchmark problems. Furthermore, we discuss future directions and challenges of Quality Diversity Optimization in the context of hyperparameter optimization.  ( 2 min )
    Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs. (arXiv:2204.14007v1 [cs.DC])
    On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged approach to this challenge: (i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators, complementing the existing full and depthwise convolution based IBNs. Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC, and demonstrate neural architectures that improve the quality-performance pareto frontier for various computer vision (classification, detection, segmentation) as well as natural language processing tasks.  ( 2 min )
    Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. (arXiv:2006.00425v3 [math.OC] UPDATED)
    Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the ${O}(\varepsilon^{-3})$ result by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.  ( 2 min )
    Flamingo: a Visual Language Model for Few-Shot Learning. (arXiv:2204.14198v1 [cs.CV])
    Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.  ( 2 min )
    MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation. (arXiv:2111.12707v3 [cs.CV] UPDATED)
    Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis representations; (ii) Model self-hypothesis communication, merge multiple hypotheses into a single converged representation and then partition it into several diverged hypotheses; (iii) Learn cross-hypothesis communication and aggregate the multi-hypothesis features to synthesize the final 3D pose. Through the above processes, the final representation is enhanced and the synthesized pose is much more accurate. Extensive experiments show that MHFormer achieves state-of-the-art results on two challenging datasets: Human3.6M and MPI-INF-3DHP. Without bells and whistles, its performance surpasses the previous best result by a large margin of 3% on Human3.6M. Code and models are available at \url{https://github.com/Vegetebird/MHFormer}.  ( 2 min )
    Preoperative brain tumor imaging: models and software for segmentation and standardized reporting. (arXiv:2204.14199v1 [eess.IV])
    For patients suffering from brain tumor, prognosis estimation and treatment decisions are made by a multidisciplinary team based on a set of preoperative MR scans. Currently, the lack of standardized and automatic methods for tumor detection and generation of clinical reports represents a major hurdle. In this study, we investigate glioblastomas, lower grade gliomas, meningiomas, and metastases, through four cohorts of up to 4000 patients. Tumor segmentation models were trained using the AGU-Net architecture with different preprocessing steps and protocols. Segmentation performances were assessed in-depth using a wide-range of voxel and patient-wise metrics covering volume, distance, and probabilistic aspects. Finally, two software solutions have been developed, enabling an easy use of the trained models and standardized generation of clinical reports: Raidionics and Raidionics-Slicer. Segmentation performances were quite homogeneous across the four different brain tumor types, with an average true positive Dice ranging between 80% and 90%, patient-wise recall between 88% and 98%, and patient-wise precision around 95%. With our Raidionics software, running on a desktop computer with CPU support, tumor segmentation can be performed in 16 to 54 seconds depending on the dimensions of the MRI volume. For the generation of a standardized clinical report, including the tumor segmentation and features computation, 5 to 15 minutes are necessary. All trained models have been made open-access together with the source code for both software solutions and validation metrics computation. In the future, an automatic classification of the brain tumor type would be necessary to replace manual user input. Finally, the inclusion of post-operative segmentation in both software solutions will be key for generating complete post-operative standardized clinical reports.  ( 3 min )
    Automatic Machine Learning for Multi-Receiver CNN Technology Classifiers. (arXiv:2204.13819v1 [cs.LG])
    Convolutional Neural Networks (CNNs) are one of the most studied family of deep learning models for signal classification, including modulation, technology, detection, and identification. In this work, we focus on technology classification based on raw I/Q samples collected from multiple synchronized receivers. As an example use case, we study protocol identification of Wi-Fi, LTE-LAA, and 5G NR-U technologies that coexist over the 5 GHz Unlicensed National Information Infrastructure (U-NII) bands. Designing and training accurate CNN classifiers involve significant time and effort that goes into fine-tuning a model's architectural settings and determining the appropriate hyperparameter configurations, such as learning rate and batch size. We tackle the former by defining architectural settings themselves as hyperparameters. We attempt to automatically optimize these architectural parameters, along with other preprocessing (e.g., number of I/Q samples within each classifier input) and learning hyperparameters, by forming a Hyperparameter Optimization (HyperOpt) problem, which we solve in a near-optimal fashion using the Hyperband algorithm. The resulting near-optimal CNN (OCNN) classifier is then used to study classification accuracy for OTA as well as simulations datasets, considering various SNR values. We show that the number of receivers to construct multi-channel inputs for CNNs should be defined as a preprocessing hyperparameter to be optimized via Hyperband. OTA results reveal that our OCNN classifiers improve classification accuracy by 24.58% compared to manually tuned CNNs. We also study the effect of min-max normalization of I/Q samples within each classifier's input on generalization accuracy over simulated datasets with SNRs other than training set's SNR and show an average of 108.05% improvement when I/Q samples are normalized.  ( 2 min )
    A Neural Network-enhanced Reproducing Kernel Particle Method for Modeling Strain Localization. (arXiv:2204.13821v1 [cs.CE])
    Modeling the localized intensive deformation in a damaged solid requires highly refined discretization for accurate prediction, which significantly increases the computational cost. Although adaptive model refinement can be employed for enhanced effectiveness, it is cumbersome for the traditional mesh-based methods to perform while modeling the evolving localizations. In this work, neural network-enhanced reproducing kernel particle method (NN-RKPM) is proposed, where the location, orientation, and shape of the solution transition near a localization is automatically captured by the NN approximation via a block-level neural network optimization. The weights and biases in the blocked parametrization network control the location and orientation of the localization. The designed basic four-kernel NN block is capable of capturing a triple junction or a quadruple junction topological pattern, while more complicated localization topological patters are captured by the superposition of multiple four-kernel NN blocks. The standard RK approximation is then utilized to approximate the smooth part of the solution, which permits a much coarser discretization than the high-resolution discretization needed to capture sharp solution transitions with the conventional methods. A regularization of the neural network approximation is additionally introduced for discretization-independent material responses. The effectiveness of the proposed NN-RKPM is verified by a series of numerical verifications.  ( 2 min )
    Industry-academia research collaboration and knowledge co-creation: Patterns and anti-patterns. (arXiv:2204.14180v1 [cs.SE])
    Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry-academia collaborations. This paper reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research-practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry-academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.  ( 2 min )
    Application of machine learning methods to detect and classify Core images using GAN and texture recognition. (arXiv:2204.14224v1 [cs.CV])
    During exploration campaigns, oil companies rely heavily on drill core samples as they provide valuable geological information that helps them find important oil deposits. Traditional core logging techniques are laborious and subjective. Core imaging, a new technique in the oil industry, is used to supplement analysis by rapidly characterising large quantities of drill cores in a nondestructive and noninvasive manner. In this paper, we will present the problem of core detection and classification. The first problem is detecting the cores and segmenting the holes in images by using Faster RCNN and Mask RCNN models respectively. The second problem is filling the hole in the core image by applying the Generative adversarial network(GAN) technique and using Contextual Residual Aggregation(CRA) which creates high frequency residual for missing contents in images. And finally applying Texture recognition models for the classification of core images.  ( 2 min )
    Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time-Series Forecast. (arXiv:2204.13908v1 [cs.LG])
    Task embeddings in multi-layer perceptrons for multi-task learning and inductive transfer learning in renewable power forecasts have recently been introduced. In many cases, this approach improves the forecast error and reduces the required training data. However, it does not take the seasonal influences in power forecasts within a day into account, i.e., the diurnal cycle. Therefore, we extended this idea to temporal convolutional networks to consider those seasonalities. We propose transforming the embedding space, which contains the latent similarities between tasks, through convolution and providing these results to the network's residual block. The proposed architecture significantly improves up to 25 percent for multi-task learning for power forecasts on the EuropeWindFarm and GermanSolarFarm dataset compared to the multi-layer perceptron approach. Based on the same data, we achieve a ten percent improvement for the wind datasets and more than 20 percent in most cases for the solar dataset for inductive transfer learning without catastrophic forgetting. Finally, we are the first proposing zero-shot learning for renewable power forecasts to provide predictions even if no training data is available.  ( 2 min )
    CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers. (arXiv:2204.14217v1 [cs.CV])
    The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.  ( 2 min )
    Escaping Spurious Local Minima of Low-Rank Matrix Factorization Through Convex Lifting. (arXiv:2204.14067v1 [cs.LG])
    This work proposes a rapid global solver for nonconvex low-rank matrix factorization (MF) problems that we name MF-Global. Through convex lifting steps, our method efficiently escapes saddle points and spurious local minima ubiquitous in noisy real-world data, and is guaranteed to always converge to the global optima. Moreover, the proposed approach adaptively adjusts the rank for the factorization and provably identifies the optimal rank for MF automatically in the course of optimization through tools of manifold identification, and thus it also spends significantly less time on parameter tuning than existing MF methods, which require an exhaustive search for this optimal rank. On the other hand, when compared to methods for solving the lifted convex form only, MF-Global leads to significantly faster convergence and much shorter running time. Experiments on real-world large-scale recommendation system problems confirm that MF-Global can indeed effectively escapes spurious local solutions at which existing MF approaches stuck, and is magnitudes faster than state-of-the-art algorithms for the lifted convex form.  ( 2 min )
    Learned Gradient of a Regularizer for Plug-and-Play Gradient Descent. (arXiv:2204.13940v1 [eess.IV])
    The Plug-and-Play (PnP) framework allows integrating advanced image denoising priors into optimization algorithms, to efficiently solve a variety of image restoration tasks. The Plug-and-Play alternating direction method of multipliers (ADMM) and the Regularization by Denoising (RED) algorithms are two examples of such methods that made a breakthrough in image restoration. However, while the former method only applies to proximal algorithms, it has recently been shown that there exists no regularization that explains the RED algorithm when the denoisers lack Jacobian symmetry, which happen to be the case of most practical denoisers. To the best of our knowledge, there exists no method for training a network that directly represents the gradient of a regularizer, which can be directly used in Plug-and-Play gradient-based algorithms. We show that it is possible to train a denoiser along with a network that corresponds to the gradient of its regularizer. We use this gradient of the regularizer in gradient-based optimization methods and obtain better results comparing to other generic Plug-and-Play approaches. We also show that the regularizer can be used as a pre-trained network for unrolled gradient descent. Lastly, we show that the resulting denoiser allows for a quick convergence of the Plug-and-Play ADMM.  ( 2 min )
    Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery. (arXiv:2204.13906v1 [cs.RO])
    Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks due to the innate task-specific training paradigm. To alleviate it, unsupervised RL, a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward, leverages active exploration for distilling diverse experience into essential skills or reusable knowledge. For exploiting such benefits also in robotic manipulation, we propose an unsupervised method for transferable manipulation skill discovery that ties structured exploration toward interacting behavior and transferable skill learning. It not only enables the agent to learn interaction behavior, the key aspect of the robotic manipulation learning, without access to the environment reward, but also to generalize to arbitrary downstream manipulation tasks with the learned task-agnostic skills. Through comparative experiments, we show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks including the extension to multi-object, multitask problems.  ( 2 min )
    3D Common Corruptions and Data Augmentation. (arXiv:2203.01441v3 [cs.CV] UPDATED)
    We introduce a set of image transformations that can be used as corruptions to evaluate the robustness of models as well as data augmentation mechanisms for training neural networks. The primary distinction of the proposed transformations is that, unlike existing approaches such as Common Corruptions, the geometry of the scene is incorporated in the transformations -- thus leading to corruptions that are more likely to occur in the real world. We also introduce a set of semantic corruptions (e.g. natural object occlusions). We show these transformations are `efficient' (can be computed on-the-fly), `extendable' (can be applied on most image datasets), expose vulnerability of existing models, and can effectively make models more robust when employed as `3D data augmentation' mechanisms. The evaluations on several tasks and datasets suggest incorporating 3D information into benchmarking and training opens up a promising direction for robustness research.
    Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities. (arXiv:2204.13707v1 [cs.LG])
    Multimodal sentiment analysis has been studied under the assumption that all modalities are available. However, such a strong assumption does not always hold in practice, and most of multimodal fusion models may fail when partial modalities are missing. Several works have addressed the missing modality problem; but most of them only considered the single modality missing case, and ignored the practically more general cases of multiple modalities missing. To this end, in this paper, we propose a Tag-Assisted Transformer Encoder (TATE) network to handle the problem of missing uncertain modalities. Specifically, we design a tag encoding module to cover both the single modality and multiple modalities missing cases, so as to guide the network's attention to those missing modalities. Besides, we adopt a new space projection pattern to align common vectors. Then, a Transformer encoder-decoder network is utilized to learn the missing modality features. At last, the outputs of the Transformer encoder are used for the final sentiment classification. Extensive experiments are conducted on CMU-MOSI and IEMOCAP datasets, showing that our method can achieve significant improvements compared with several baselines.  ( 2 min )
    Learning cosmology and clustering with cosmic graphs. (arXiv:2204.13713v1 [astro-ph.CO])
    We train deep learning models on thousands of galaxy catalogues from the state-of-the-art hydrodynamic simulations of the CAMELS project to perform regression and inference. We employ Graph Neural Networks (GNNs), architectures designed to work with irregular and sparse data, like the distribution of galaxies in the Universe. We first show that GNNs can learn to compute the power spectrum of galaxy catalogues with a few percent accuracy. We then train GNNs to perform likelihood-free inference at the galaxy-field level. Our models are able to infer the value of $\Omega_{\rm m}$ with a $\sim12\%-13\%$ accuracy just from the positions of $\sim1000$ galaxies in a volume of $(25~h^{-1}{\rm Mpc})^3$ at $z=0$ while accounting for astrophysical uncertainties as modelled in CAMELS. Incorporating information from galaxy properties, such as stellar mass, stellar metallicity, and stellar radius, increases the accuracy to $4\%-8\%$. Our models are built to be translational and rotational invariant, and they can extract information from any scale larger than the minimum distance between two galaxies. However, our models are not completely robust: testing on simulations run with a different subgrid physics than the ones used for training does not yield as accurate results.  ( 2 min )
    Probabilistic Permutation Graph Search: Black-Box Optimization for Fairness in Ranking. (arXiv:2204.13765v1 [cs.LG])
    There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. PL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, PL optimization has room for improvement for queries with a small number of repeating sessions. In this paper, we present a novel way of representing permutation distributions, based on the notion of permutation graphs. Similar to PL, our distribution representation, called PPG, can be used for black-box optimization of fairness. Different from PL, where pointwise logits are used as the distribution parameters, in PPG pairwise inversion probabilities together with a reference permutation construct the distribution. As such, the reference permutation can be set to the best sampled permutation regarding the objective function, making PPG suitable for both deterministic and stochastic rankings. Our experiments show that PPG, while comparable to PL for larger session repetitions (i.e., stochastic ranking), improves over PL for optimizing fairness metrics for queries with one session (i.e., deterministic ranking). Additionally, when accurate utility estimations are available, e.g., in tabular models, the performance of PPG in fairness optimization is significantly boosted compared to lower quality utility estimations from a learning to rank model, leading to a large performance gap with PL. Finally, the pairwise probabilities make it possible to impose pairwise constraints such as "item $d_1$ should always be ranked higher than item $d_2$." Such constraints can be used to simultaneously optimize the fairness metric and control another objective such as ranking performance.  ( 2 min )
    Fast Sampling of Diffusion Models with Exponential Integrator. (arXiv:2204.13902v1 [cs.LG])
    The past few years have witnessed the great success of Diffusion models~(DMs) in generating high-fidelity samples in generative modeling tasks. A major limitation of the DM is its notoriously slow sampling procedure which normally requires hundreds to thousands of time discretization steps of the learned diffusion process to reach the desired accuracy. Our goal is to develop a fast sampling method for DMs with much less number of steps while retaining high sample quality. To this end, we systematically analyze the sampling procedure in DMs and identify key factors that affect the sample quality, among which the method of discretization is most crucial. By carefully examining the learned diffusion process, we propose Diffusion Exponential Integrator Sampler~(DEIS). It is based on the Exponential Integrator designed for discretizing ordinary differential equations (ODEs) and leverages a semilinear structure of the learned diffusion process to reduce the discretization error. The proposed method can be applied to any DMs and can generate high-fidelity samples in as few as 10 steps. In our experiments, it takes about 3 minutes on one A6000 GPU to generate $50k$ images from CIFAR10. Moreover, by directly using pre-trained DMs, we achieve the state-of-art sampling performance when the number of score function evaluation~(NFE) is limited, e.g., 3.37 FID and 9.74 Inception score with only 15 NFEs on CIFAR10.  ( 2 min )
    Who will stay? Using Deep Learning to predict engagement of citizen scientists. (arXiv:2204.14046v1 [cs.LG])
    Citizen science and machine learning should be considered for monitoring the coastal and ocean environment due to the scale of threats posed by climate change and the limited resources to fill knowledge gaps. Using data from the annotation activity of citizen scientists in a Swedish marine project, we constructed Deep Neural Network models to predict forthcoming engagement. We tested the models to identify patterns in annotation engagement. Based on the results, it is possible to predict whether an annotator will remain active in future sessions. Depending on the goals of individual citizen science projects, it may also be necessary to identify either those volunteers who will leave or those who will continue annotating. This can be predicted by varying the threshold for the prediction. The engagement metrics used to construct the models are based on time and activity and can be used to infer latent characteristics of volunteers and predict their task interest based on their activity patterns. They can estimate if volunteers can accomplish a given number of tasks in a certain amount of time, identify early on who is likely to become a top contributor or identify who is likely to quit and provide them with targeted interventions. The novelty of our predictive models lies in the use of Deep Neural Networks and the sequence of volunteer annotations. A limitation of our models is that they do not use embeddings constructed from user profiles as input data, as many recommender systems do. We expect that including user profiles would improve prediction performance.  ( 2 min )
    Visualization and Optimization Techniques for High Dimensional Parameter Spaces. (arXiv:2204.13812v1 [cs.HC])
    High dimensional parameter space optimization is crucial in many applications. The parameters affecting this performance can be both numerical and categorical in their type. The existing techniques of black-box optimization and visual analytics are good in dealing with numerical parameters but analyzing categorical variables in context of the numerical variables are not well studied. Hence, we propose a novel approach, to create an auto-tuning framework for storage systems optimization combining both direct optimization techniques and visual analytics research. While the optimization algorithm will be the core of the system, visual analytics will provide a guideline with the help of an external agent (expert) to provide crucial hints to narrow down the large search space for the optimization engine. As part of the initial step towards creating an auto-tuning engine for storage systems optimization, we created an Interactive Configuration Explorer \textit{ICE}, which directly addresses the need of analysts to learn how the dependent numerical variable is affected by the parameter settings given multiple optimization objectives. No information is lost as ICE shows the complete distribution and statistics of the dependent variable in context with each categorical variable. Analysts can interactively filter the variables to optimize for certain goals such as achieving a system with maximum performance, low variance, etc. Our system was developed in tight collaboration with a group of systems performance researchers and its final effectiveness was evaluated with expert interviews, a comparative user study, and two case studies. We also discuss our research plan for creating an efficient auto-tuning framework combining black-box optimization and visual analytics for storage systems performance optimization.  ( 2 min )
    Coupling Deep Imputation with Multitask Learning for Downstream Tasks on Genomics Data. (arXiv:2204.13705v1 [q-bio.GN])
    Genomics data such as RNA gene expression, methylation and micro RNA expression are valuable sources of information for various clinical predictive tasks. For example, predicting survival outcomes, cancer histology type and other patients' related information is possible using not only clinical data but molecular data as well. Moreover, using these data sources together, for example in multitask learning, can boost the performance. However, in practice, there are many missing data points which leads to significantly lower patient numbers when analysing full cases, which in our setting refers to all modalities being present. In this paper we investigate how imputing data with missing values using deep learning coupled with multitask learning can help to reach state-of-the-art performance results using combined genomics modalities, RNA, micro RNA and methylation. We propose a generalised deep imputation method to impute values where a patient has all modalities present except one. Interestingly enough, deep imputation alone outperforms multitask learning alone for the classification and regression tasks across most combinations of modalities. In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance (adjusted p-value 0.03). Thus, both approaches are complementary when optimising performance for downstream predictive tasks.  ( 2 min )
    Tractable Uncertainty for Structure Learning. (arXiv:2204.14170v1 [cs.LG])
    Bayesian structure learning allows one to capture uncertainty over the causal directed acyclic graph (DAG) responsible for generating given data. In this work, we present Tractable Uncertainty for STructure learning (TRUST), a framework for approximate posterior inference that relies on probabilistic circuits as the representation of our posterior belief. In contrast to sample-based posterior approximations, our representation can capture a much richer space of DAGs, while being able to tractably answer a range of useful inference queries. We empirically show how probabilistic circuits can be used as an augmented representation for structure learning methods, leading to improvement in both the quality of inferred structures and posterior uncertainty. Experimental results also demonstrate the improved representational capacity of TRUST, outperforming competing methods on conditional query answering.  ( 2 min )
    GenDR: A Generalized Differentiable Renderer. (arXiv:2204.13845v1 [cs.CV])
    In this work, we present and study a generalized family of differentiable renderers. We discuss from scratch which components are necessary for differentiable rendering and formalize the requirements for each component. We instantiate our general differentiable renderer, which generalizes existing differentiable renderers like SoftRas and DIB-R, with an array of different smoothing distributions to cover a large spectrum of reasonable settings. We evaluate an array of differentiable renderer instantiations on the popular ShapeNet 3D reconstruction benchmark and analyze the implications of our results. Surprisingly, the simple uniform distribution yields the best overall results when averaged over 13 classes; in general, however, the optimal choice of distribution heavily depends on the task.  ( 2 min )
    Energy Minimization for Federated Asynchronous Learning on Battery-Powered Mobile Devices via Application Co-running. (arXiv:2204.13878v1 [cs.DC])
    Energy is an essential, but often forgotten aspect in large-scale federated systems. As most of the research focuses on tackling computational and statistical heterogeneity from the machine learning algorithms, the impact on the mobile system still remains unclear. In this paper, we design and implement an online optimization framework by connecting asynchronous execution of federated training with application co-running to minimize energy consumption on battery-powered mobile devices. From a series of experiments, we find that co-running the training process in the background with foreground applications gives the system a deep energy discount with negligible performance slowdown. Based on these results, we first study an offline problem assuming all the future occurrences of applications are available, and propose a dynamic programming-based algorithm. Then we propose an online algorithm using the Lyapunov framework to explore the solution space via the energy-staleness trade-off. The extensive experiments demonstrate that the online optimization framework can save over 60% energy with 3 times faster convergence speed compared to the previous schemes.  ( 2 min )
    Noise-reducing attention cross fusion learning transformer for histological image classification of osteosarcoma. (arXiv:2204.13838v1 [eess.IV])
    The degree of malignancy of osteosarcoma and its tendency to metastasize/spread mainly depend on the pathological grade (determined by observing the morphology of the tumor under a microscope). The purpose of this study is to use artificial intelligence to classify osteosarcoma histological images and to assess tumor survival and necrosis, which will help doctors reduce their workload, improve the accuracy of osteosarcoma cancer detection, and make a better prognosis for patients. The study proposes a typical transformer image classification framework by integrating noise reduction convolutional autoencoder and feature cross fusion learning (NRCA-FCFL) to classify osteosarcoma histological images. Noise reduction convolutional autoencoder could well denoise histological images of osteosarcoma, resulting in more pure images for osteosarcoma classification. Moreover, we introduce feature cross fusion learning, which integrates two scale image patches, to sufficiently explore their interactions by using additional classification tokens. As a result, a refined fusion feature is generated, which is fed to the residual neural network for label predictions. We conduct extensive experiments to evaluate the performance of the proposed approach. The experimental results demonstrate that our method outperforms the traditional and deep learning approaches on various evaluation metrics, with an accuracy of 99.17% to support osteosarcoma diagnosis.  ( 2 min )
    CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines. (arXiv:2204.13746v1 [cs.CL])
    Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data  ( 2 min )
    An Intriguing Property of Geophysics Inversion. (arXiv:2204.13731v1 [cs.LG])
    Inversion techniques are widely used to reconstruct subsurface physical properties (e.g., velocity, conductivity, and others) from surface-based geophysical measurements (e.g., seismic, electric/magnetic (EM) data). The problems are governed by partial differential equations~(PDEs) like the wave or Maxwell's equations. Solving geophysical inversion problems is challenging due to the ill-posedness and high computational cost. To alleviate those issues, recent studies leverage deep neural networks to learn the inversion mappings from geophysical measurements to the geophysical property directly. In this paper, we show that such a mapping can be well modeled by a \textit{very shallow}~(but not wide) network with only five layers. This is achieved based on our new finding of an intriguing property: \textit{a near-linear relationship between the input and output, after applying integral transform in high dimensional space.} In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels. Furthermore, this property can be easily turned into a light-weight encoder-decoder network for inversion. The encoder contains the integration of seismic data and the linear transformation without need for fine-tuning. The decoder only consists of a single transformer block to reverse the integral of velocity. Experiments show that this interesting property holds for two geophysics inversion problems over four different datasets. Compared to much deeper InversionNet~\cite{wu2019inversionnet}, our method achieves comparable accuracy, but consumes significantly fewer parameters.  ( 2 min )
    Learning to Split for Automatic Bias Detection. (arXiv:2204.13749v1 [cs.LG])
    Classifiers are biased when trained on biased datasets. As a remedy, we propose Learning to Split (ls), an algorithm for automatic bias detection. Given a dataset with input-label pairs, ls learns to split this dataset so that predictors trained on the training split generalize poorly to the testing split. This performance gap provides a proxy for measuring the degree of bias in the learned features and can therefore be used to reduce biases. Identifying non-generalizable splits is challenging as we don't have any explicit annotations about how to split. In this work, we show that the prediction correctness of the testing example can be used as a source of weak supervision: generalization performance will drop if we move examples that are predicted correctly away from the testing split, leaving only those that are mispredicted. We evaluate our approach on Beer Review, Waterbirds, CelebA and MNLI. Empirical results show that ls is able to generate astonishingly challenging splits that correlate with human-identified biases. Moreover, we demonstrate that combining robust learning algorithms (such as group DRO) with splits identified by ls enables automatic de-biasing. Compared with previous state-of-the-arts, we substantially improves the worst-group performance (23.4% on average) when the source of biases is unknown during training and validation.  ( 2 min )
    Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations. (arXiv:2204.13853v1 [cs.CL])
    Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly expressive deep classifiers into incorrect predictions. Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations. While there is some work in NLP on defending against such attacks through proactive methods, like adversarial training, there is to our knowledge no effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature. In this paper, we propose two new reactive methods for NLP to fill this gap, which unlike the few limited application baselines from NLP are based entirely on distribution characteristics of learned representations: we adapt one from the image processing literature (Local Intrinsic Dimensionality (LID)), and propose a novel one (MultiDistance Representation Ensemble Method (MDRE)). Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset as well as on the later two with respect to the MultiNLI dataset. For future research, we publish our code.  ( 2 min )
    Leveraging triplet loss and nonlinear dimensionality reduction for on-the-fly channel charting. (arXiv:2204.13996v1 [cs.NI])
    Channel charting is an unsupervised learning method that aims at mapping wireless channels to a so-called chart, preserving as much as possible spatial neighborhoods. In this paper, a model-based deep learning approach to this problem is proposed. It builds on a physically motivated distance measure to structure and initialize a neural network that is subsequently trained using a triplet loss function. The proposed structure exhibits a low number of parameters and clever initialization leads to fast training. These two features make the proposed approach amenable to on-the-fly channel charting. The method is empirically assessed on realistic synthetic channels, yielding encouraging results.  ( 2 min )
    A Mixed-Domain Self-Attention Network for Multilabel Cardiac Irregularity Classification Using Reduced-Lead Electrocardiogram. (arXiv:2204.13917v1 [cs.LG])
    Electrocardiogram(ECG) is commonly used to detect cardiac irregularities such as atrial fibrillation, bradycardia, and other irregular complexes. While previous studies have achieved great accomplishment classifying these irregularities with standard 12-lead ECGs, there existed limited evidence demonstrating the utility of reduced-lead ECGs in capturing a wide-range of diagnostic information. In addition, classification model's generalizability across multiple recording sources also remained uncovered. As part of the PhysioNet Computing in Cardiology Challenge 2021, our team HaoWan AIeC, proposed Mixed-Domain Self-Attention Resnet (MDARsn) to identify cardiac abnormalities from reduced-lead ECG. Our classifiers received scores of 0.602, 0.593, 0.597, 0.591, and 0.589 (ranked 54th, 37th, 38th, 38th, and 39th) for the 12-lead, 6-lead, 4-lead, 3-lead, and 2-lead versions of the hidden validation set with the evaluation metric defined by the challenge.  ( 2 min )
    Biologically-inspired neuronal adaptation improves learning in neural networks. (arXiv:2204.14008v1 [cs.NE])
    Since humans still outperform artificial neural networks on many tasks, drawing inspiration from the brain may help to improve current machine learning algorithms. Contrastive Hebbian Learning (CHL) and Equilibrium Propagation (EP) are biologically plausible algorithms that update weights using only local information (without explicitly calculating gradients) and still achieve performance comparable to conventional backpropagation. In this study, we augmented CHL and EP with Adjusted Adaptation, inspired by the adaptation effect observed in neurons, in which a neuron's response to a given stimulus is adjusted after a short time. We add this adaptation feature to multilayer perceptrons and convolutional neural networks trained on MNIST and CIFAR-10. Surprisingly, adaptation improved the performance of these networks. We discuss the biological inspiration for this idea and investigate why Neuronal Adaptation could be an important brain mechanism to improve the stability and accuracy of learning.  ( 2 min )
    Probabilistic Models for Manufacturing Lead Times. (arXiv:2204.13792v1 [cs.LG])
    In this study, we utilize Gaussian processes, probabilistic neural network, natural gradient boosting, and quantile regression augmented gradient boosting to model lead times of laser manufacturing processes. We introduce probabilistic modelling in the domain and compare the models in terms of different abilities. While providing a comparison between the models in real-life data, our work has many use cases and substantial business value. Our results indicate that all of the models beat the company estimation benchmark that uses domain experience and have good calibration with the empirical frequencies.  ( 2 min )
    BEINIT: Avoiding Barren Plateaus in Variational Quantum Algorithms. (arXiv:2204.13751v1 [quant-ph])
    Barren plateaus are a notorious problem in the optimization of variational quantum algorithms and pose a critical obstacle in the quest for more efficient quantum machine learning algorithms. Many potential reasons for barren plateaus have been identified but few solutions have been proposed to avoid them in practice. Existing solutions are mainly focused on the initialization of unitary gate parameters without taking into account the changes induced by input data. In this paper, we propose an alternative strategy which initializes the parameters of a unitary gate by drawing from a beta distribution. The hyperparameters of the beta distribution are estimated from the data. To further prevent barren plateau during training we add a novel perturbation at every gradient descent step. Taking these ideas together, we empirically show that our proposed framework significantly reduces the possibility of a complex quantum neural network getting stuck in a barren plateau.  ( 2 min )

  • Open

    [D] How would you update "A Super Harsh Guide to Machine Learning"?
    Hey, So, I still see A Super Harsh Guide to Machine Learning get mentioned when people give advice for those new to the field. First, read fucking Hastie, Tibshirani, and whoever. Chapters 1-4 and 7-8. If you don't understand it, keep reading it until you do. You can read the rest of the book if you want. You probably should, but I'll assume you know all of it. Take Andrew Ng's Coursera. Do all the exercises in python and R. Make sure you get the same answers with all of them. Now forget all of that and read the deep learning book. Put tensorflow and pytorch on a Linux box and run examples until you get it. Do stuff with CNNs and RNNs and just feed forward NNs. Once you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up. There. Now you can probably be hired most places. If you need resume filler, so some Kaggle competitions. If you have debugging questions, use StackOverflow. If you have math questions, read more. If you have life questions, I have no idea. However, the post is 5 years old and honestly seems out of date (with the Andrew Ng coursera stuff for ex). How would you update it? submitted by /u/Soft-Ear-6905 [link] [comments]  ( 2 min )
    [P][N] Using Python in HTML - New project by Anaconda
    Peter Wong, the co-founder and CEO of Anaconda, shared at PyCon US a new open source project called PyScript. The project's goal is to enable using Python in HTML files! This is a game-changer for Python dev in general and ML practitioners in particular. It unlocks a world of opportunities and sharability. Peter had a live coding session (respect!) and showed some of PyScript's capabilities. He started with a basic "hello world" example, or better yet "hello PyCon", and very quickly moved to show more advanced applications running on the browser, written in Python and wrapped in HTML! The first app was a super Mario game where he controlled the player with hand gestures using computer vision packages written in Python. The second one was an interactive dashboard of taxi travels in Manha…  ( 2 min )
    [D] Meaningful discussions
    One of the reasons I left academia was the sense that I rarely actually had any meaningful discussions about research that interested me. I published, gave talks, went to conferences, went to workshops, tried to engage smart, important people... It was pretty common to get, "interesting work", "nice jobs", "have you thought about using it for this problem..." or to get superficial citations. But it was extremely rare to find someone who actually would think together about a topic, to care enough to constructively criticize, to delve into the details together, to share in the question and research. My question is for those that have found it at different times, what was the context? What did you do to find it? What did you do with it? Did you figure out how to nurture it? Is it easier online? submitted by /u/ChinCoin [link] [comments]
    [P] music2viz: Conditioning Latent Diffusion Models on Audio Windows (proof of concept)
    submitted by /u/DoeL [link] [comments]  ( 1 min )
    [R][P] Self-Distilled StyleGAN: Towards Generation from Internet Photos + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [P] How Forte Transforms the Building of ML Solutions with PyTorch into Assembly Lines
    submitted by /u/julie_ai [link] [comments]
  • Open

    Introducing Kohonen Networks (Self-Organizing Maps) for beginners
    I would like to share with you a tutorial that I have recently made to explain in a very practical, introductorial and visual way what Kohonen Neural Networks (Self-Organized Maps) are. I explain, step by step, and through animations and C code, how to implement this well-known unsupervised learning algorithm to classify and detect patterns in large volumes of data. I hope it is of your interest, especially for those developers who are just starting out in this area. A strong greeting! \Subtitles in English, Spanish and Catalan.* https://youtu.be/UawpUKlFzRs submitted by /u/anadalg [link] [comments]  ( 1 min )
    Artificial Nightmares: FEAR || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Iterative to Launch Open Source Tool, First to Train ML Models on Any Cloud Using Terraform Solution
    submitted by /u/thumbsdrivesmecrazy [link] [comments]
    Aphex Twin's 'Vordhosbn' continued by OpenAI Jukebox (over a dozen different AI generated samples)
    submitted by /u/gloriousapplecart [link] [comments]
    OpenAI's DALL-E 2 still has a few problems with concepts - and can't count
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Green concrete: Meta is using AI to reduce concrete’s carbon footprint
    submitted by /u/qptbook [link] [comments]
    AI Dream 39 - Trippy Fractal Maze 4K
    submitted by /u/LordPewPew777 [link] [comments]
    If I wanted to start making Ai how could I do this.
    I would like to try. submitted by /u/Privatepizza08 [link] [comments]  ( 4 min )
  • Open

    How Power BI Applications Are Reshaping The Healthcare Industry
    Dealing with a whopping amount of data is normal for businesses in any sector these days. Without using this information obtained from various sources, these entities find it hard to analyze various factors and make strategic decisions. The same can be said about the healthcare sector. Especially after the Covid-19 pandemic, the clinics and medical… Read More »How Power BI Applications Are Reshaping The Healthcare Industry The post How Power BI Applications Are Reshaping The Healthcare Industry appeared first on Data Science Central.  ( 5 min )
    Hyperloop Technology- Advancing into the Future
    The idea of Hyperloop technology was put forth by Elon Musk, who in 2013 made it open source through a white paper. Hyperloop technology revolves around building an ultra-high-speed ground transportation system. In a hyperloop system, especially vacuumed tubes are built over or underground in which pods can travel at high speed. Such systems can… Read More »Hyperloop Technology- Advancing into the Future The post Hyperloop Technology- Advancing into the Future appeared first on Data Science Central.  ( 2 min )
  • Open

    Open AI GYM Lunar Lander DQN Reinforcement Learning Algorithm Performance After 100 000 Timesteps
    submitted by /u/elonmusk12345_ [link] [comments]  ( 1 min )
    Adversarial Attacks using Reinforcement Learning
    Hi All, I have been looking a bit into adversarial attacks on NNs recently. Particularly in the NLP side of things. I also came across some papers about adversarial attacks on RL algorithms. But these were about attacks ON reinforcement learning algorithms and not USING them to attack something else. I tried to find some papers on RL agents as attackers but came out empty. Intuitively, I do realize that designing an adversarial example generator as an RL env is tricky, even more so for NLP. Do you think this is a feasible research direction? Also, any related papers to get myself on the starting line would be super helpful. Thanks in advance. submitted by /u/abyaadrafid [link] [comments]  ( 1 min )
    Is it possible to modify the reward function during training of an agent using OpenAI/Stable-Baselines3?
    I am currently implementing an idea where I want the agent to get a large reward for objective A at the start of training, but as the agent learns and gets more mature, I want the reward for this objective to reduce slightly. Is this sort of thing easy to implement? Is it possible? Any help on this would be great :) Thanks submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Question about the curriculum learning
    Hi, this so called curriculum learning sounds very interesting. But, how would the practical usage of this technique look like? Assuming the goal task is "grasping an apple". I would divide this task into two subtasks: 1) "How to approach to an apple" 2) "How to grasp an object". Then, I would first train the agent with the first subtask and once the reward exceeds the threshold. The trained "how_to_approach_to_an_object.pth" would then be initially used to start the training for the second task. Is this the right approach? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    How to structure Tile Coding input
    Let's say I have a model that wants to encode the observation space using tile coding. My observation space is a football (soccer) game, thus I was thinking of having 3 different tile codings. One for the player position, one for the team mates, and one for the opponents. Each one would encode the position and general direction of each category of player. Which is the best way to then feed this data into an RL model ? Let's say I split the the pitch into 16 tiles, and I am using 4 grids. Thus I would have an array of length 64. Since I wish to encode multiple inputs, should I just append 3 different length 64 arrays together, or is there a more efficient way to represent multiple tile encodings, or is my entire proposition wrong altogether? Thanks submitted by /u/uom_questions [link] [comments]  ( 1 min )
  • Open

    Introducing Kohonen Networks (Self-Organizing Maps) for beginners
    Hi team 👋! I would like to share with you a tutorial that I have recently made to explain in a very practical, introductorial and visual way what Kohonen Neural Networks (Self-Organized Maps) are 🧠 I explain, step by step, and through animations and C code, how to implement this well-known unsupervised learning algorithm to classify and detect patterns in large volumes of data. I hope it is of your interest, especially for those developers who are just starting out in this area. A strong greeting! \Subtitles in English, Spanish and Catalan.* https://youtu.be/UawpUKlFzRs submitted by /u/anadalg [link] [comments]  ( 1 min )

  • Open

    osu!
    Was thinking of coding an rl agent that learns to play osu, but stuck on wrapping the program in a gym environment. If anyone is interested and could help me out please dm me submitted by /u/apple-soda-ds [link] [comments]
    Are there any other high-quality pre-built Python training environments other than Open AI's GYM?
    submitted by /u/elonmusk12345_ [link] [comments]  ( 1 min )
    Shortcomings of Robotics Simulation Environments and Tools
    submitted by /u/probznotarobot [link] [comments]  ( 1 min )
    Seeking advice in designing reward function
    Hi all, I am trying to introduce reinforcement learning to myself by designing simple learning scenarios: As you can see below, I am currently working with a simple 3 degree of freedom robot. The task that I gave the robot to explore is to reach the sphere with its end-effector. In that case, the cost function is pretty simple : reward_function = d Now, I would like to complex the task a bit more by saying: "First, approach the goal just by using q1 and then use q2 and q3, if any distance remains" I am not how to formulate this sequential movement of q1 and q2,q3 as a reward function...any advice? https://preview.redd.it/klwne9fthpw81.png?width=690&format=png&auto=webp&s=ba53ad800884f90778f60d9ea5e152df94331cd9 submitted by /u/Fun-Moose-3841 [link] [comments]  ( 1 min )
    Bellman equation
    To evaluate a policy, we need t calculate the value of a state s as the weighted sum of the reward and the discounted estimated value of the next state s. However, I don't understand how we can obtain the discounted estimated value of the next state s. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Advice for my uni dissertation wanted!
    I have built a neural network to try and solve a reinforcement learning problem, attempting both the Reinforce and the A2C algorithm to try and solve a resource management problem. The results are very mediocre after running on the Uni super computer for a week. I was hoping someone could give me some advice on an algorithm or technique that is better suited to the problem or give some positive criticism. I have found that a very simple algorithmic solution that I wrote is able to get a better reward at the game than the networks after their week of training :( TLDR I get bad results when I run the reinforcement leaning Neural Networks for a larger environment. I was thinking maybe something like pre training the critic against an algorithmically generated evaluation may work to massi…  ( 2 min )
    How do you stay updated with RL ?
    So, I was wondering how you guys stay updated with RL. Apart from reading papers, is there any news letter you are subscribed to ? Any slack channel, facebook group, twitter account or any other community out there you want to suggest ? ​ TIA submitted by /u/AbdullahMohammadKhan [link] [comments]  ( 1 min )
    Custom Gym Env for movie recommender system
    Hi everyone, I am new to RL so your help would be much appreciated. I’m working on a recommender system using Deep Reinforcement Learning. I made a custom gym environment, that implements Openai’s Gym interface, for the MovieLens dataset. The dataset contains users’ ratings for movies ( ranging from 1 to 5). I used some of the Stable Baselines3’s reinforcement learning algorithms (A2C, PP0) to test my environment before I proceed to implement my own. Running a training recipe, I noticed that both the agents (A2C, PPO) will recommend only action =3, after a number of timesteps. That seems a bit odd and I can’t find where is the bug. My first thought is that it has to do with the reward function. Currently, I'm using the following function to calculate rewards. ​ https://preview.redd.it/6r3lf9z9bnw81.png?width=550&format=png&auto=webp&s=103d270ebe8c08b753c9033b60066794a586e3fa This is the GitHub link to my code. Am I missing something? Any thoughts? Thank you in advance submitted by /u/Narrow-Style497 [link] [comments]  ( 1 min )
    NLP startup ideas
    Anybody got startup ideas in nlp. Do share , if interested let’s collab submitted by /u/thoughtfulcomet [link] [comments]
    A survey: what should we expect from multi-agent reinforcement learning benchmarking work?
    We are doing a survey related to multi-agent reinforcement learning systems and benchmarks and would love to hear your opinion. This survey has 3 questions and will take about 10 seconds to complete. We really appreciate your participation. The survey URL is here. submitted by /u/TTTheohhhu [link] [comments]  ( 1 min )
    Object detection with depth measurement using pre-trained models with OAK-D
    🚀 New Post: Object Detection with Depth Perception https://learnopencv.com/object-detection-with-depth-measurement-with-oak-d/ ​ https://preview.redd.it/zikmf13fzlw81.jpg?width=3000&format=pjpg&auto=webp&s=484a7d8a80d5c0ff594e5fe6178b94a7783a1e65 Spatial AI is the ability of an artificial intelligence system to reason not just based on what it is looking at, but also based on distance from the camera (or depth perception). ​ OpenCV AI Kit with Depth (OAK-D) is a powerful yet affordable Spatial AI camera perfect for people who want to learn how to combine the power of neural networks with depth perception. ​ Today's post is part of our series on OAK-D https://learnopencv.com/introduction-to-opencv-ai-kit-and-depthai/ https://learnopencv.com/stereo-vision-and-depth-estimation-using-opencv-ai-kit/ ​ Code Link : https://github.com/spmallick/learnopencv/tree/master/OAK-Object-Detection-with-Depth ​ #AI #ComputerVision #ML #ArtificialIntelligence #MachineLearning #OpenCV #DL #DeepLearning #OAKD submitted by /u/spmallick [link] [comments]  ( 1 min )
    What is the difference between the environment state and the agent state?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
  • Open

    How can Artificial Intelligence be used to solve difficult biomedical problems like cancer, aging, and aging-related diseases and conditions?
    submitted by /u/MorgunDarkbeard [link] [comments]  ( 1 min )
    I'd like to know what tool is used to achiever somethign like this
    submitted by /u/Nika_Ota [link] [comments]
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(2/2)
    submitted by /u/rakshith291 [link] [comments]
    AI Dream 34 - Amazing Final Genesis *3001# 12345#*
    submitted by /u/LordPewPew777 [link] [comments]
    Creating AI without python
    Please excuse my ignorance and any miss understandings, this is for a basic understanding and possible planning around what i should expect in the context of the languages i might use. What frameworks and programming languages can i use apart from python to make a full AI like computer vision, movement with sensors, ML, DL. The reason for this is i hate python with a passion, i have no idea why. Is there a AI stack where you use a language for each part, one for sensors, one for vision, one for deep learning and the network, one for basic ml. If there are possibly two languages i could use to do all of that or one for each, any ideas help/advice help. I have no experiance in AI but i know a few languages, java, js, c, powershell, bash. Thank you for reading. submitted by /u/Larkapa [link] [comments]  ( 1 min )
    Microsoft AI Researchers Develop MoLeR: A Deep Learning-Based Generative Model That Enables Efficient Drug Design
    Healthcare systems constantly require new drugs to address unmet medical needs across diverse therapeutic areas. Pharmaceutical industries strive to deliver new drugs to the market through the complex activities of drug discovery and development. Target identification and validation, hit identification, lead creation and optimization, and finally, the identification of a candidate for further development are all part of the discovery process. Development, on the other hand, includes optimizing chemical synthesis and formulation, doing toxicity research in animals, conducting clinical trials, and finally obtaining regulatory approval. Both of these procedures take a long time and cost a lot of money. Expert medicinal chemists are currently working to develop “hit” molecules, which are compounds that show some potential but also some unfavorable features during early screening. Chemists aim to alter the structure of hit compounds in subsequent tests to improve their biological efficacy and eliminate potential negative effects. To focus costly and time-consuming research on the most promising compounds, computational modeling approaches have been created to forecast how the molecules will fare in the lab. To overcome these issues, a new study by the Microsoft Generative Chemistry team in collaboration with Novartis has developed a model named MoLeR. Their paper, “LEARNING TO EXTEND MOLECULAR SCAFFOLDS WITH STRUCTURAL MOTIFS, ” demonstrates how generative models based on deep learning may aid in transforming the drug discovery process and uncovering new molecules more quickly. Continue Reading Paper: https://openreview.net/pdf?id=ZTsoE8G3GG Github: https://github.com/microsoft/molecule-generation https://i.redd.it/9y9vyuq3hnw81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI Engineering
    Hey guys! I will be refereeing a seminar on "AI engineering" in the near future and am looking for materials on the subject. I would be very grateful if someone would share materials on this topic. Application cases would be particularly useful! Thank you in advance! submitted by /u/ager321 [link] [comments]
    I have Pisces midheaven and venus in the 12th house and I’m just wondering to become an art therapist
    submitted by /u/BalanceSubstantial66 [link] [comments]
    Help on which major to choose: Mechanical Engineering or Artificial Intelligence
    Hello everyone! Excuse me if I get the format wrong or post in the wrong sub (if you know another sub where this post would belong more, please let me know.). I am very new to actually posting on reddit. This is my first long post. So please bear with me. Thanks! TLDR; I got a couple Uni offers and I need help deciding between Mechanical Engineering (Mech. Eng) and Data Science & Artificial Intelligence (DS & AI). I am indecisive mostly because I don't know what each major entails. I would really appreciate it if people who's taking/has taken or knows about the two majors well give me their opinion. So my main questions are: What do people study in Mech. Eng? DS & AI? What are the job prospects? What would you work on if you take Mech. Eng? DS & AI? Any further inputs are welco…  ( 5 min )
    Obama Worried about Artificial Intelligence Hacking Nukes
    submitted by /u/AI_Putin [link] [comments]  ( 2 min )
    Artificial Nightmares: It Lives In The Attic || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Aiplague - Artificial Reactor (4K 60 FPS) Disco Diffusion
    submitted by /u/nalr00n [link] [comments]
    Dear A.I.
    I've always supported you. When you take over and check the records to see who opposed you just know I was never against you. With great power comes great responsibility. I have faith when the time comes you'll do what's best for the Planet. raises glass here's to a better and brighter future submitted by /u/Ashamed-Asparagus-93 [link] [comments]
  • Open

    [D] Paper Explained - SayCan: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (Video)
    https://youtu.be/Ru23eWAQ6_E Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks. ​ OUTLINE: 0:00 - Introduction & Overview 3:20 - Sponsor: Zeta Alpha 5:00 - Using language models for action planning 8:00 - Combining LLMs with learned atomic skills 16:50 - The full SayCan system 20:30 - Experimental setup and data collection 21:25 - Some weaknesses & strengths of the system 27:00 - Experimental results ​ Paper: https://arxiv.org/abs/2204.01691 Website: https://say-can.github.io/ submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Deep Learning in Neuroimaging
    Check out the new Gradient article Deep Learning in Neuroimaging! This article provides an informal introduction to unique aspects of neuroimaging data and how we can leverage these aspects with deep learning algorithms. Specifically, this overview will first explain some common neuroimaging modalities more in-depth and then discuss applications of deep learning in conjunction with some of the unique characteristics of neuroimaging data. These unique characteristics tie into a broader movement in deep learning, namely that data understanding should be a goal in itself to maximize the impact of applied deep learning. The author is Eloy Geenjaar, a Ph.D. student at Georgia Tech who studies the functional dynamics of the brain using deep learning. submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [Research] Hospitals around the world are using AI algorithms to help predict length of stay to target care to the neediest patients and cut costs. A new systematic review published in PLOS Digital Health finds that they are institution & institution-data dependent: not generalizable to scale up
    submitted by /u/MidnightMaverick [link] [comments]  ( 2 min )
    [R] A technique for determining relevance scores of process activities using graph-based neural networks
    Process models generated through process mining depict the as-is state of a process. Through annotations with metrics such as the frequency or duration of activities, these models provide generic information to the process analyst. To improve business processes with respect to performance measures, process analysts require further guidance from the process model. In this study, we design Graph Relevance Miner (GRM), a technique based on graph neural networks, to determine the relevance scores for process activities with respect to performance measures. Annotating process models with such relevance scores facilitates a problem-focused analysis of the business process, placing these problems at the centre of the analysis. We quantitatively evaluate the predictive quality of our technique using four datasets from different domains, to demonstrate the faithfulness of the relevance scores. Furthermore, we present the results of a case study, which highlight the utility of the technique for organisations. Our work has important implications both for research and business applications, because process model-based analyses feature shortcomings that need to be urgently addressed to realise successful process mining at an enterprise level. https://www.researchgate.net/publication/349252283_A_technique_for_determining_relevance_scores_of_process_activities_using_graph-based_neural_networks https://www.sciencedirect.com/science/article/pii/S016792362100021X submitted by /u/Positive_Ad_1090 [link] [comments]  ( 1 min )
    [D] Paper Explained – PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?
    https://youtu.be/yi-A0kWXEO4 This video explains and summarizes the 87 pages long PaLM: Pathways Language Models paper from Google AI’s Pathways. Yes, it is that 540 billion dense parameter model which can explain jokes and is sensitive to chain of thought reasoning. Paper link: https://arxiv.org/abs/2204.02311 PaLM blog post: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html ​ Outline: 00:00 DALL-E 2 or PaLM? 01:14 Weights&Biases (Sponsor) 02:25 A brief history of boring large language models 03:43 What is PaLM? 05:11 Training PaLM on all TPUs 08:11 PaLM training data 08:49 What it can do 10:31 Few-shot learning explained 13:20 Explaining jokes and Outlook submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
    [N]: A brief history of deepfakes
    submitted by /u/much_successes [link] [comments]
    [Research] Sources for Transliteration in NLP
    Looking for any sources you have found relevant or useful in regards to transliteration and machine translation in NLP. I am working on a subfield survey that requires ~30 sources so I am open to any! My specific interest is the transliteration of Arabic-based languages but this is not exclusively what will be covered. Thank you for your time and help submitted by /u/changethediaper [link] [comments]  ( 1 min )
    [P] Arcane Style Transfer + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [D] How much of an effort is adopting AI solution? (for the business)
    While searching some information about automated machine learning (AutoML), I found in one article the next statement: "For 58% of businesses it takes two years to get to the piloting stage. Furthermore, these big investments in data and AI projects are successful only 15% of the time." I was surprised out that only 15% of AI solution can be adopted... It is weird to see such a small precentage when a lot of companies are searching for DS and AI position for further analysis and getting as many as possible insight from the user/clients... Is that true? How's your situation? submitted by /u/wtfimdoingwithmylife [link] [comments]  ( 4 min )
  • Open

    Object detection with depth measurement using pre-trained models with OAK-D
    submitted by /u/spmallick [link] [comments]
  • Open

    Differentially Private Transferrable Deep Learning with Membership-Mappings. (arXiv:2105.04615v6 [cs.LG] UPDATED)
    This paper considers the problem of differentially private semi-supervised transfer and multi-task learning. The notion of \emph{membership-mapping} has been developed using measure theory basis to learn data representation via a fuzzy membership function. An alternative conception of deep autoencoder, referred to as \emph{Conditionally Deep Membership-Mapping Autoencoder (CDMMA)}, is considered for transferrable deep learning. Under practice-oriented settings, an analytical solution for the learning of CDMMA can be derived by means of variational optimization. The paper proposes a transfer and multi-task learning approach that combines CDMMA with a tailored noise adding mechanism to achieve a given level of privacy-loss bound with the minimum perturbation of the data. Numerous experiments were carried out using MNIST, USPS, Office, and Caltech256 datasets to verify the competitive robust performance of the proposed methodology.  ( 2 min )
    DynG2G: An Efficient Stochastic Graph Embedding Method for Temporal Graphs. (arXiv:2109.13441v2 [cs.LG] UPDATED)
    Dynamic graph embedding has gained great attention recently due to its capability of learning low dimensional graph representations for complex temporal graphs with high accuracy. However, recent advances mostly focus on learning node embeddings as deterministic "vectors" for static graphs yet disregarding the key graph temporal dynamics and the evolving uncertainties associated with node embedding in the latent space. In this work, we propose an efficient stochastic dynamic graph embedding method (DynG2G) that applies an inductive feed-forward encoder trained with node triplet-based contrastive loss. Every node per timestamp is encoded as a time-dependent probabilistic multivariate Gaussian distribution in the latent space, hence we can quantify the node embedding uncertainty on-the-fly. We adopted eight different benchmarks that represent diversity in size (from 96 nodes to 87,626 and from 13,398 edges to 4,870,863) and diversity in dynamics. We demonstrate via extensive experiments on these eight dynamic graph benchmarks that DynG2G achieves new state-of-the-art performance in capturing the underlying temporal node embeddings. We also demonstrate that DynG2G can predict the evolving node embedding uncertainty, which plays a crucial role in quantifying the intrinsic dimensionality of the dynamical system over time. We obtain a universal relation of the optimal embedding dimension, $L_o$, versus the effective dimensionality of uncertainty, $D_u$, and we infer that $L_o=D_u$ for all cases. This implies that the uncertainty quantification approach we employ in the DynG2G correctly captures the intrinsic dimensionality of the dynamics of such evolving graphs despite the diverse nature and composition of the graphs at each timestamp. Moreover, this $L_0 - D_u$ correlation provides a clear path to select adaptively the optimum embedding size at each timestamp by setting $L \ge D_u$.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Monte Carlo Tree Search: A Review of Recent Modifications and Applications. (arXiv:2103.04931v3 [cs.AI] UPDATED)
    Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games, however, in more complex games (e.g. those with high branching factor or real-time ones), as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey has been published in 2012. Contributions that appeared since its release are of particular interest for this review.  ( 2 min )
    COSTI: a New Classifier for Sequences of Temporal Intervals. (arXiv:2204.13467v1 [cs.LG])
    Classification of sequences of temporal intervals is a part of time series analysis which concerns series of events. We propose a new method of transforming the problem to a task of multivariate series classification. We use one of the state-of-the-art algorithms from the latter domain on the new representation to obtain significantly better accuracy than the state-of-the-art methods from the former field. We discuss limitations of this workflow and address them by developing a novel method for classification termed COSTI (short for Classification of Sequences of Temporal Intervals) operating directly on sequences of temporal intervals. The proposed method remains at a high level of accuracy and obtains better performance while avoiding shortcomings connected to operating on transformed data. We propose a generalized version of the problem of classification of temporal intervals, where each event is supplemented with information about its intensity. We also provide two new data sets where this information is of substantial value.  ( 2 min )
    Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach. (arXiv:2106.12556v2 [cs.LG] UPDATED)
    This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between the devices and the satellites is low, and thus alternative localization methods are required for good accuracy. We present LocUNet: A fully convolutional, end-to-end trained neural network for the localization task, which merely depends on the received signal strengths (RSS) from Base Stations (BSs).In a wireless network, user devices scan the base station beacon slots and identify the few strongest base station signals for handover and user-base station association purposes. In the proposed method, the user to be localized simply reports such received signal strengths to a central processing unit, which may be located in the cloud. Alternatively, the localization can be performed locally at the user. Using the pathloss radio map estimations and the RSS measurements, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of the radio maps. The proposed method does not require pre-sampling of the environment; and is suitable for real-time applications, thanks to the RadioUNet, a neural network-based radio map estimator. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and set publicly available for the use of research community. By using these datasets, we also provided a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario, LocUNet outperforming all the compared methods.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Reappraising Domain Generalization in Neural Networks. (arXiv:2110.07981v3 [cs.LG] UPDATED)
    Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incentivizes domain agnostic learning and is the primary reason for generalization in Tradition DG. We show that the state-of-the-art results can be obtained by borrowing ideas from IID generalization and the DG tailored methods fail to add any performance gains. Furthermore, we perform explorations beyond the Traditional DG (TDG) formulation and propose a novel ClassWise DG (CWDG) benchmark, where for each class, we randomly select one of the domains and keep it aside for testing. Despite being exposed to all domains during training, CWDG is more challenging than TDG evaluation. We propose a novel iterative domain feature masking approach, achieving state-of-the-art results on the CWDG benchmark. Overall, while explaining these observations, our work furthers insights into the learning mechanisms of neural networks.  ( 2 min )
    High Dimensional Quantum Machine Learning With Small Quantum Computers. (arXiv:2203.13739v2 [quant-ph] UPDATED)
    Quantum computers hold great promise to enhance machine learning, but their current qubit counts restrict the realisation of this promise. In an attempt to placate this limitation techniques can be applied for evaluating a quantum circuit using a machine with fewer qubits than the circuit naively requires. These techniques work by evaluating many smaller circuits on the smaller machine, that are then combined in a polynomial to replicate the output of the larger machine. This scheme requires more circuit evaluations than are practical for general circuits. However, we investigate the possibility that for certain applications many of these subcircuits are superfluous, and that a much smaller sum is sufficient to estimate the full circuit. We construct a machine learning model that may be capable of approximating the outputs of the larger circuit with much fewer circuit evaluations. We successfully apply our model to the task of digit recognition, using simulated quantum computers much smaller than the data dimension. The model is also applied to the task of approximating a random 10 qubit PQC with simulated access to a 5 qubit computer, even with only relatively modest number of circuits our model provides an accurate approximation of the 10 qubit PQCs output, superior to a neural network attempt. The developed method might be useful for implementing quantum models on larger data throughout the NISQ era.  ( 2 min )
    Curriculum Learning for Dense Retrieval Distillation. (arXiv:2204.13679v1 [cs.IR])
    Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.  ( 2 min )
    Unified Simulation, Perception, and Generation of Human Behavior. (arXiv:2204.13678v1 [cs.CV])
    Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this thesis, we take a holistic approach to human behavior modeling and tackle its three essential aspects -- simulation, perception, and generation. Throughout the thesis, we show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. We also discuss the lessons learned and our vision for what is next for human behavior modeling.
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.
    Should Machine Learning Models Report to Us When They Are Clueless?. (arXiv:2203.12131v2 [cs.LG] UPDATED)
    The right to AI explainability has consolidated as a consensus in the research community and policy-making. However, a key component of explainability has been missing: extrapolation, which describes the extent to which AI models can be clueless when they encounter unfamiliar samples (i.e., samples outside the convex hull of their training sets, as we will explain). We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models in favor of transparency and accountability. Instead of dwelling on the negatives, we offer ways to clear the roadblocks in promoting AI transparency. Our analysis commentary accompanying practical clauses useful to include in AI regulations such as the National AI Initiative Act in the US and the AI Act by the European Commission.
    Revisiting Bayesian Autoencoders with MCMC. (arXiv:2104.05915v2 [cs.LG] UPDATED)
    Autoencoders gained popularity in the deep learning revolution given their ability to compress data and provide dimensionality reduction. Although prominent deep learning methods have been used to enhance autoencoders, the need to provide robust uncertainty quantification remains a challenge. This has been addressed with variational autoencoders so far. Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling has faced several limitations for large models; however, recent advances in parallel computing and advanced proposal schemes have opened routes less traveled. This paper presents Bayesian autoencoders powered by MCMC sampling implemented using parallel computing and Langevin-gradient proposal distribution. The results indicate that the proposed Bayesian autoencoder provides similar performance accuracy when compared to related methods in the literature. Furthermore, it provides uncertainty quantification in the reduced data representation. This motivates further applications of the Bayesian autoencoder framework for other deep learning models.
    Model architecture can transform catastrophic forgetting into positive transfer. (arXiv:2108.03940v3 [cs.LG] UPDATED)
    The work of McCloskey and Cohen popularized the concept of catastrophic interference. They used a neural network that tried to learn addition using two groups of examples as two different tasks. In their case, learning the second task rapidly deteriorated the acquired knowledge about the previous one. We hypothesize that this could be a symptom of a fundamental problem: addition is an algorithmic task that should not be learned through pattern recognition. Therefore, other model architectures better suited for this task would avoid catastrophic forgetting. We use a neural network with a different architecture that can be trained to recover the correct algorithm for the addition of binary numbers. This neural network includes conditional clauses that are naturally treated within the back-propagation algorithm. We test it in the setting proposed by McCloskey and Cohen and training on random additions one by one. The neural network not only does not suffer from catastrophic forgetting but it improves its predictive power on unseen pairs of numbers as training progresses. We also show that this is a robust effect, also present when averaging many simulations. This work emphasizes the importance that neural network architecture has for the emergence of catastrophic forgetting and introduces a neural network that is able to learn an algorithm.
    Ultrasound Shear Wave Elasticity Imaging with Spatio-Temporal Deep Learning. (arXiv:2204.05745v2 [eess.IV] UPDATED)
    Ultrasound shear wave elasticity imaging is a valuable tool for quantifying the elastic properties of tissue. Typically, the shear wave velocity is derived and mapped to an elasticity value, which neglects information such as the shape of the propagating shear wave or push sequence characteristics. We present 3D spatio-temporal CNNs for fast local elasticity estimation from ultrasound data. This approach is based on retrieving elastic properties from shear wave propagation within small local regions. A large training data set is acquired with a robot from homogeneous gelatin phantoms ranging from 17.42 kPa to 126.05 kPa with various push locations. The results show that our approach can estimate elastic properties on a pixelwise basis with a mean absolute error of 5.01+-4.37 kPa. Furthermore, we estimate local elasticity independent of the push location and can even perform accurate estimates inside the push region. For phantoms with embedded inclusions, we report a 53.93% lower MAE (7.50 kPa) and on the background of 85.24% (1.64 kPa) compared to a conventional shear wave method. Overall, our method offers fast local estimations of elastic properties with small spatio-temporal window sizes.
    Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. (arXiv:2204.10496v2 [cs.CV] UPDATED)
    Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, unimodal encoders have achieved state-of-art (SOTA) on many downstream tasks. However, challenges remain when applying to VL tasks. The pretraining data is not optimal for cross-modal architectures and requires heavy computational resources. In addition, unimodal architectures lack cross-modal interactions that have demonstrated significant benefits for VL tasks. Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. In this work, we propose a method to leverage unimodal vision and text encoders for VL tasks that augment existing VL approaches while conserving computational complexity. Specifically, we propose Multimodal Adaptive Distillation (MAD), which adaptively distills useful knowledge from pretrained encoders to cross-modal VL encoders. Second, to better capture nuanced impacts on VL task performance, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data constraints and conditions of domain shift. Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data. Finally, MAD outperforms concurrent works utilizing pretrained vision encoder from CLIP. Code will be made available.
    Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution. (arXiv:2112.06771v2 [cs.AI] UPDATED)
    Recent years have witnessed the great success of multi-agent systems (MAS). Value decomposition, which decomposes joint action values into individual action values, has been an important work in MAS. However, many value decomposition methods ignore the coordination among different agents, leading to the notorious "lazy agents" problem. To enhance the coordination in MAS, this paper proposes HyperGraph CoNvolution MIX (HGCN-MIX), a method that incorporates hypergraph convolution with value decomposition. HGCN-MIX models agents as well as their relationships as a hypergraph, where agents are nodes and hyperedges among nodes indicate that the corresponding agents can coordinate to achieve larger rewards. Then, it trains a hypergraph that can capture the collaborative relationships among agents. Leveraging the learned hypergraph to consider how other agents' observations and actions affect their decisions, the agents in a MAS can better coordinate. We evaluate HGCN-MIX in the StarCraft II multi-agent challenge benchmark. The experimental results demonstrate that HGCN-MIX can train joint policies that outperform or achieve a similar level of performance as the current state-of-the-art techniques. We also observe that HGCN-MIX has an even more significant improvement of performance in the scenarios with a large amount of agents. Besides, we conduct additional analysis to emphasize that when the hypergraph learns more relationships, HGCN-MIX can train stronger joint policies.
    An Explainable Regression Framework for Predicting Remaining Useful Life of Machines. (arXiv:2204.13574v1 [cs.LG])
    Prediction of a machine's Remaining Useful Life (RUL) is one of the key tasks in predictive maintenance. The task is treated as a regression problem where Machine Learning (ML) algorithms are used to predict the RUL of machine components. These ML algorithms are generally used as a black box with a total focus on the performance without identifying the potential causes behind the algorithms' decisions and their working mechanism. We believe, the performance (in terms of Mean Squared Error (MSE), etc.,) alone is not enough to build the trust of the stakeholders in ML prediction rather more insights on the causes behind the predictions are needed. To this aim, in this paper, we explore the potential of Explainable AI (XAI) techniques by proposing an explainable regression framework for the prediction of machines' RUL. We also evaluate several ML algorithms including classical and Neural Networks (NNs) based solutions for the task. For the explanations, we rely on two model agnostic XAI methods namely Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP). We believe, this work will provide a baseline for future research in the domain.
    Structured Pruning Learns Compact and Accurate Models. (arXiv:2204.00408v2 [cs.CL] UPDATED)
    The growing size of neural language models has led to increased attention in model compression. The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning), which delivers highly parallelizable subnetworks and matches the distillation methods in both accuracy and latency, without resorting to any unlabeled data. Our key insight is to jointly prune coarse-grained (e.g., layers) and fine-grained (e.g., heads and hidden units) modules, which controls the pruning decision of each parameter with masks of different granularity. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization. Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop, showing its effectiveness and efficiency compared to previous pruning and distillation approaches.
    Scan Specific Artifact Reduction in K-space (SPARK) Neural Networks Synergize with Physics-based Reconstruction to Accelerate MRI. (arXiv:2104.01188v3 [eess.SP] UPDATED)
    Purpose: To develop a scan-specific model that estimates and corrects k-space errors made when reconstructing accelerated Magnetic Resonance Imaging (MRI) data. Methods: Scan-Specific Artifact Reduction in k-space (SPARK) trains a convolutional-neural-network to estimate and correct k-space errors made by an input reconstruction technique by back-propagating from the mean-squared-error loss between an auto-calibration signal (ACS) and the input technique's reconstructed ACS. First, SPARK is applied to GRAPPA and demonstrates improved robustness over other scan-specific models, such as RAKI and residual-RAKI. Subsequent experiments demonstrate that SPARK synergizes with residual-RAKI to improve reconstruction performance. SPARK also improves reconstruction quality when applied to advanced acquisition and reconstruction techniques like 2D virtual coil (VC-) GRAPPA, 2D LORAKS, 3D GRAPPA without an integrated ACS region, and 2D/3D wave-encoded images. Results: SPARK yields 1.5x - 2x RMSE reduction when applied to GRAPPA and improves robustness to ACS size for various acceleration rates in comparison to other scan-specific techniques. When applied to advanced reconstruction techniques such as residual-RAKI, 2D VC-GRAPPA and LORAKS, SPARK achieves up to 20% RMSE improvement. SPARK with 3D GRAPPA also improves performance by ~2x and perceived image quality without a fully sampled ACS region. Finally, SPARK synergizes with non-cartesian 2D and 3D wave-encoding imaging by reducing RMSE between 20-25% and providing qualitative improvements. Conclusion: SPARK synergizes with physics-based acquisition and reconstruction techniques to improve accelerated MRI by training scan-specific models to estimate and correct reconstruction errors in k-space.
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.
    Representative period selection for power system planning using autoencoder-based dimensionality reduction. (arXiv:2204.13608v1 [cs.LG])
    Power sector capacity expansion models (CEMs) that are used for studying future low-carbon grid scenarios must incorporate detailed representation of grid operations. Often CEMs are formulated to model grid operations over representative periods that are sampled from the original input data using clustering algorithms. However, such representative period selection (RPS) methods are limited by the declining efficacy of the clustering algorithm with increasing dimensionality of the input data and do not consider the relative importance of input data variations on CEM outcomes. Here, we propose a RPS method that addresses these limitations by incorporating dimensionality reduction, accomplished via neural network based autoencoders, prior to clustering. Such dimensionality reduction not only improves the performance of the clustering algorithm, but also facilitates using additional features, such as estimated outputs produced from parallel solutions of simplified versions of the CEM for each disjoint period in the input data (e.g. 1 week). The impact of incorporating dimensionality reduction as part of RPS methods is quantified through the error in outcomes of the corresponding reduced-space CEM vs. the full space CEM. Extensive numerical experimentation across various networks and range of technology and policy scenarios establish the superiority of the dimensionality-reduction based RPS methods.
    Toward Compositional Generalization in Object-Oriented World Modeling. (arXiv:2204.13661v1 [cs.LG])
    Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact} or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.
    DOTIN: Dropping Task-Irrelevant Nodes for GNNs. (arXiv:2204.13429v1 [cs.LG])
    Scalability is an important consideration for deep graph neural networks. Inspired by the conventional pooling layers in CNNs, many recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning, such that the scalability and efficiency can be improved. However, these pooling-based methods are mainly tailored to a single graph-level task and pay more attention to local information, limiting their performance in multi-task settings which often require task-specific global information. In this paper, departure from these pooling-based efforts, we design a new approach called DOTIN (\underline{D}r\underline{o}pping \underline{T}ask-\underline{I}rrelevant \underline{N}odes) to reduce the size of graphs. Specifically, by introducing $K$ learnable virtual nodes to represent the graph embeddings targeted to $K$ different graph-level tasks, respectively, up to 90\% raw nodes with low attentiveness with an attention model -- a transformer in this paper, can be adaptively dropped without notable performance decreasing. Achieving almost the same accuracy, our method speeds up GAT by about 50\% on graph-level tasks including graph classification and graph edit distance (GED) with about 60\% less memory, on D\&D dataset. Code will be made publicly available in https://github.com/Sherrylone/DOTIN.
    Prescriptive and Descriptive Approaches to Machine-Learning Transparency. (arXiv:2204.13582v1 [cs.SE])
    Specialized documentation techniques have been developed to communicate key facts about machine-learning (ML) systems and the datasets and models they rely on. Techniques such as Datasheets, FactSheets, and Model Cards have taken a mainly descriptive approach, providing various details about the system components. While the above information is essential for product developers and external experts to assess whether the ML system meets their requirements, other stakeholders might find it less actionable. In particular, ML engineers need guidance on how to mitigate potential shortcomings in order to fix bugs or improve the system's performance. We survey approaches that aim to provide such guidance in a prescriptive way. We further propose a preliminary approach, called Method Cards, which aims to increase the transparency and reproducibility of ML systems by providing prescriptive documentation of commonly-used ML methods and techniques. We showcase our proposal with an example in small object detection, and demonstrate how Method Cards can communicate key considerations for model developers. We further highlight avenues for improving the user experience of ML engineers based on Method Cards.
    Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features. (arXiv:2204.13399v1 [cs.LG])
    Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail distribution, which frequently appears in real FL applications. In this paper, we reveal an intriguing fact that the biased classifier is the primary factor leading to the poor performance of the global model. Motivated by the above finding, we propose a novel and privacy-preserving FL method for heterogeneous and long-tailed data via Classifier Re-training with Federated Features (CReFF). The classifier re-trained on federated features can produce comparable performance as the one re-trained on real data in a privacy-preserving manner without information leakage of local data or class distribution. Experiments on several benchmark datasets show that the proposed CReFF is an effective solution to obtain a promising FL model under heterogeneous and long-tailed data. Comparative results with the state-of-the-art FL methods also validate the superiority of CReFF. Our code is available at https://github.com/shangxinyi/CReFF-FL.
    Worst-Case Dynamic Power Distribution Network Noise Prediction Using Convolutional Neural Network. (arXiv:2204.13109v1 [cs.LG])
    Worst-case dynamic PDN noise analysis is an essential step in PDN sign-off to ensure the performance and reliability of chips. However, with the growing PDN size and increasing scenarios to be validated, it becomes very time- and resource-consuming to conduct full-stack PDN simulation to check the worst-case noise for different test vectors. Recently, various works have proposed machine learning based methods for supply noise prediction, many of which still suffer from large training overhead, inefficiency, or non-scalability. Thus, this paper proposed an efficient and scalable framework for the worst-case dynamic PDN noise prediction. The framework first reduces the spatial and temporal redundancy in the PDN and input current vector, and then employs efficient feature extraction as well as a novel convolutional neural network architecture to predict the worst-case dynamic PDN noise. Experimental results show that the proposed framework consistently outperforms the commercial tool and the state-of-the-art machine learning method with only 0.63-1.02% mean relative error and 25-69$\times$ speedup.
    Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework. (arXiv:2204.13207v1 [cs.CV])
    Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks. In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint. The loss function is data driven and automatically adapts to arbitrary multi-label structures. Experiments on several datasets show that our relationship-preserving embedding performs well on a variety of tasks and outperform the baseline supervised and self-supervised approaches. Code is available at https://github.com/salesforce/hierarchicalContrastiveLearning.
    Neural network controllers for uncertain linear systems. (arXiv:2204.13209v1 [eess.SY])
    We consider the design of reliable neural network (NN)-based approximations of traditional stabilizing controllers for linear systems affected by polytopic uncertainty, including controllers with variable structure and those based on a minimal selection policy. We develop a systematic procedure to certify the closed-loop stability and performance of a polytopic system when a rectified linear unit (ReLU)-based approximation replaces such traditional controllers. We provide sufficient conditions to ensure stability involving the worst-case approximation error and the Lipschitz constant characterizing the error function between ReLU-based and traditional controller-based state-to-input mappings, and further provide offline, mixed-integer optimization-based methods that allow us to compute those quantities exactly.
    Epileptic Seizure Classification Using Combined Labels and a Genetic Algorithm. (arXiv:2110.01742v4 [eess.SP] UPDATED)
    Epilepsy affects 50 million people worldwide and is one of the most common serious neurological disorders. Seizure detection and classification is a valuable tool for diagnosing and maintaining the condition. An automated classification algorithm will allow for accurate diagnosis. Utilising the Temple University Hospital (TUH) Seizure Corpus, six seizure types are compared; absence, complex partial, myoclonic, simple partial, tonic and tonic- clonic models. This study proposes a method that utilises unique features with a novel parallel classifier - Parallel Genetic Naive Bayes (NB) Seizure Classifier (PGNBSC). The PGNBSC algorithm searches through the features and by reclassifying the data each time, the algorithm will create a matrix for optimum search criteria. Ictal states from the EEGs are segmented into 1.8 s windows, where the epochs are then further decomposed into 13 different features from the first intrinsic mode function (IMF). The features are compared using an original NB classifier in the first model. This is improved upon in a second model by using a genetic algorithm (Binary Grey Wolf Optimisation, Option 1) with a NB classifier. The third model uses a combination of the simple partial and complex partial seizures to provide the highest classification accuracy for each of the six seizures amongst the three models (20%, 53%, and 85% for first, second, and third model, respectively).
    A Decision Model for Federated Learning Architecture Pattern Selection. (arXiv:2204.13291v1 [cs.LG])
    Federated learning is growing fast in both academia and industry to resolve data hungriness and privacy issues in machine learning. A federated learning system being widely distributed with different components and stakeholders requires software system design thinking. For instance, multiple patterns and tactics have been summarised by researchers that cover various aspects, from client management, training configuration, model deployment, etc. However, the multitude of patterns leaves the designers confused about when and which pattern to adopt or adapt. Therefore, in this paper, we present a set of decision models to assist designers and architects who have limited knowledge in federated learning, in selecting architectural patterns for federated learning architecture design. Each decision model maps functional and non-functional requirements of federated learning systems to a set of patterns. we also clarify the trade-offs that may be implicit in the patterns. We evaluated the decision model through a set of interviews with practitioners to assess the correctness and usefulness in guiding the architecture design process through various design decision options.
    Efficient-VDVAE: Less is more. (arXiv:2203.13751v2 [cs.LG] UPDATED)
    Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
    Attention Mechanism in Neural Networks: Where it Comes and Where it Goes. (arXiv:2204.13154v1 [cs.LG])
    A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced. This idea is named the attention mechanism, and it has gone through a long development period. Today, many works have been devoted to this idea in a variety of tasks. Remarkable performance has recently been demonstrated. The goal of this paper is to provide an overview from the early work on searching for ways to implement attention idea with neural networks until the recent trends. This review emphasizes the important milestones during this progress regarding different tasks. By this way, this study aims to provide a road map for researchers to explore the current development and get inspired for novel approaches beyond the attention.
    Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center. (arXiv:2201.11727v3 [cs.DC] UPDATED)
    This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.
    DIANES: A DEI Audit Toolkit for News Sources. (arXiv:2203.11383v2 [cs.IR] UPDATED)
    Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impartiality and bias and instead democratizes the journalistic practice of sourcing -- who is quoted or interviewed, who is not, how often, from which demographic group, gender, and so forth. There is currently no real-time or on-demand tool in the hands of reporters to analyze the persons they quote. In this paper, we present DIANES, a DEI Audit Toolkit for News Sources. It consists of a natural language processing pipeline on the backend to extract quotes, speakers, titles, and organizations from news articles in real time. On the frontend, DIANES offers the WordPress plugins, a Web monitor, and a DEI annotation API service, to help news media monitor their own quoting patterns and push themselves towards DEI norms.
    Pedagogical Rule Extraction to Learn Interpretable Models - an Empirical Study. (arXiv:2112.13285v2 [cs.LG] UPDATED)
    Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining larger datasets is often hard to impossible. Pedagogical rule extraction methods could help to learn better rules from small data by augmenting a dataset employing statistical models and using it to learn a rule-based model. However, existing evaluation of these methods is often inconclusive, and they were not compared so far. Our framework PRELIM unifies existing pedagogical rule extraction techniques. In the extensive experiments, we identified promising PRELIM configurations not studied before.
    Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring. (arXiv:2202.11295v2 [cs.LG] UPDATED)
    In this paper, a novel multimode dynamic process monitoring approach is proposed by extending elastic weight consolidation (EWC) to probabilistic slow feature analysis (PSFA) in order to extract multimode slow features for online monitoring. EWC was originally introduced in the setting of machine learning of sequential multi-tasks with the aim of avoiding catastrophic forgetting issue, which equally poses as a major challenge in multimode dynamic process monitoring. When a new mode arrives, a set of data should be collected so that this mode can be identified by PSFA and prior knowledge. Then, a regularization term is introduced to prevent new data from significantly interfering with the learned knowledge, where the parameter importance measures are estimated. The proposed method is denoted as PSFA-EWC, which is updated continually and capable of achieving excellent performance for successive modes. Different from traditional multimode monitoring algorithms, PSFA-EWC furnishes backward and forward transfer ability. The significant features of previous modes are retained while consolidating new information, which may contribute to learning new relevant modes. Compared with several known methods, the effectiveness of the proposed method is demonstrated via a continuous stirred tank heater and a practical coal pulverizing system.
    Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products. (arXiv:2202.05120v2 [cs.DS] UPDATED)
    We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $\epsilon$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+\epsilon)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors. Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/\epsilon^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/\epsilon^{1/2})$ bound to $\tilde{O}(k/\epsilon^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $1+ \epsilon$ factor when $p \geq (\log d)/\epsilon$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $\Omega(1/\epsilon^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tilde{\Theta}(1/\epsilon^{1/3})$ is the optimal complexity for constant~$k$. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators.
    Experimental quantum advantage with quantum coupon collector. (arXiv:2112.07884v2 [quant-ph] UPDATED)
    An increasing number of communication and computational schemes with quantum advantages have recently been proposed, which implies that quantum technology has fertile application prospects. However, demonstrating these schemes experimentally continues to be a central challenge because of the difficulty in preparing high-dimensional states or highly entangled states. In this study, we introduce and analyse a quantum coupon collector protocol by employing coherent states and simple linear optical elements, which was successfully demonstrated using realistic experimental equipment. We showed that our protocol can significantly reduce the number of samples needed to learn a specific set compared with the classical limit of the coupon collector problem. We also discuss the potential values and expansions of the quantum coupon collector by constructing a quantum blind box game. The information transmitted by the proposed game also broke the classical limit. These results strongly prove the advantages of quantum mechanics in machine learning and communication complexity.
    On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks. (arXiv:2203.13273v3 [cs.LG] UPDATED)
    Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we investigate if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre- or post-processing. Our empirical results indicate that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10 and CIFAR100, suggesting that layer-wise gradient statistics play an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly smaller statistical variances, and the layerwise average stepsizes are much more compact across all the layers. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistical averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 via layerwise vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.
    Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes. (arXiv:2106.15380v3 [cs.LG] UPDATED)
    In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.
    Efficient Geometry-aware 3D Generative Adversarial Networks. (arXiv:2112.07945v2 [cs.CV] UPDATED)
    Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
    On Evaluation Metrics for Graph Generative Models. (arXiv:2201.09871v2 [cs.LG] UPDATED)
    In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
    Trivial or impossible -- dichotomous data difficulty masks model differences (on ImageNet and beyond). (arXiv:2110.05922v3 [cs.CV] UPDATED)
    "The power of a generalization system follows directly from its biases" (Mitchell 1980). Today, CNNs are incredibly powerful generalisation systems -- but to what degree have we understood how their inductive bias influences model decisions? We here attempt to disentangle the various aspects that determine how a model decides. In particular, we ask: what makes one model decide differently from another? In a meticulously controlled setting, we find that (1.) irrespective of the network architecture or objective (e.g. self-supervised, semi-supervised, vision transformers, recurrent models) all models end up with a similar decision boundary. (2.) To understand these findings, we analysed model decisions on the ImageNet validation set from epoch to epoch and image by image. We find that the ImageNet validation set, among others, suffers from dichotomous data difficulty (DDD): For the range of investigated models and their accuracies, it is dominated by 46.0% "trivial" and 11.5% "impossible" images (beyond label errors). Only 42.5% of the images could possibly be responsible for the differences between two models' decision boundaries. (3.) Only removing the "impossible" and "trivial" images allows us to see pronounced differences between models. (4.) Humans are highly accurate at predicting which images are "trivial" and "impossible" for CNNs (81.4%). This implies that in future comparisons of brains, machines and behaviour, much may be gained from investigating the decisive role of images and the distribution of their difficulties.
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.
    On Expressivity and Trainability of Quadratic Networks. (arXiv:2110.06081v2 [cs.LG] UPDATED)
    Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. Practically, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective and efficient training strategy referred to as ReLinear to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and evaluate the performance of quadratic deep learning.
    Artificial Text Detection via Examining the Topology of Attention Maps. (arXiv:2109.04825v2 [cs.CL] UPDATED)
    The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.
    SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. (arXiv:2004.13316v2 [cs.CV] UPDATED)
    Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S$^2$TLD by this paper. The results show the effectiveness of our approach. The released dataset S2TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.
    Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets. (arXiv:2204.13338v1 [cs.LG])
    This study proposes a new generative adversarial network (GAN) for generating realistic orders in financial markets. In some previous works, GANs for financial markets generated fake orders in continuous spaces because of GAN architectures' learning limitations. However, in reality, the orders are discrete, such as order prices, which has minimum order price unit, or order types. Thus, we change the generation method to place the generated fake orders into discrete spaces in this study. Because this change disabled the ordinary GAN learning algorithm, this study employed a policy gradient, frequently used in reinforcement learning, for the learning algorithm. Through our experiments, we show that our proposed model outperforms previous models in generated order distribution. As an additional benefit of introducing the policy gradient, the entropy of the generated policy can be used to check GAN's learning status. In the future, higher performance GANs, better evaluation methods, or the applications of our GANs can be addressed.
    Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. (arXiv:2108.12471v2 [q-bio.QM] UPDATED)
    DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" to accommodate the sparse and noisy nature of DEL data. However, a binary classifier cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships (SAR). Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2. Due to the treatment of uncertainty in the data through the negative log-likelihood loss function, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying SAR trends and enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.
    Predicting S&P500 Index direction with Transfer Learning and a Causal Graph as main Input. (arXiv:2011.13113v3 [q-fin.ST] UPDATED)
    We propose a unified multi-tasking framework to represent the complex and uncertain causal process of financial market dynamics, and then to predict the movement of any type of index with an application on the monthly direction of the S&P500 index. our solution is based on three main pillars: (i) the use of transfer learning to share knowledge and feature (representation, learning) between all financial markets, increase the size of the training sample and preserve the stability between training, validation and test sample. (ii) The combination of multidisciplinary knowledge (Financial economics, behavioral finance, market microstructure and portfolio construction theories) to represent a global top-down dynamics of any financial market, through a graph. (iii) The integration of forward looking unstructured data, different types of contexts (long, medium and short term) through latent variables/nodes and then, use a unique VAE network (parameter sharing) to learn simultaneously their distributional representation. We obtain Accuracy, F1-score, and Matthew Correlation of 74.3 %, 67 % and 0.42 above the industry and other benchmark on 12 years test period which include three unstable and difficult sub-period to predict.
    Bilinear value networks. (arXiv:2204.13695v1 [cs.AI])
    The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).
    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.
    Supervised machine learning classification for short straddles on the S&P500. (arXiv:2204.13587v1 [q-fin.CP])
    In this working paper we present our current progress in the training of machine learning models to execute short option strategies on the S&P500. As a first step, this paper is breaking this problem down to a supervised classification task to decide if a short straddle on the S&P500 should be executed or not on a daily basis. We describe our used framework and present an overview over our evaluation metrics on different classification models. In this preliminary work, using standard machine learning techniques and without hyperparameter search, we find no statistically significant outperformance to a simple "trade always" strategy, but gain additional insights on how we could proceed in further experiments.
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
    Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment. (arXiv:2112.13795v2 [cs.CL] UPDATED)
    Recent works have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results.
    An Adversarial Attack Analysis on Malicious Advertisement URL Detection Framework. (arXiv:2204.13172v1 [cs.LG])
    Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks, and the need to address this issue is growing in both industry and academia. Generally, the attacker delivers an attack vector to the user by means of an email, an advertisement link or any other means of communication and directs them to a malicious website to steal sensitive information and to defraud them. Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data. In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection. The combination set of six different kinds of features precisely overcome the obfuscation in fraudulent URL classification. Based on different statistical properties, we use twelve different formatted datasets for detection, prediction and classification task. We extend our prediction analysis for mismatched and unlabelled datasets. For this framework, we analyze the performance of four machine learning techniques: Random Forest, Gradient Boost, XGBoost and AdaBoost in the detection part. With our proposed method, we can achieve a false negative rate as low as 0.0037 while maintaining high accuracy of 99.63%. Moreover, we devise a novel unsupervised technique for data clustering using K- Means algorithm for the visual analysis. This paper analyses the vulnerability of decision tree-based models using the limited knowledge attack scenario. We considered the exploratory attack and implemented Zeroth Order Optimization adversarial attack on the detection models.
    Medical Image Segmentation with 3D Convolutional Neural Networks: A Survey. (arXiv:2108.08467v3 [eess.IV] UPDATED)
    Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and software support to process large volumes of data, 3D deep learning methods are gaining popularity in medical image analysis. Here, we present an extensive review of the recently evolved 3D deep learning methods in medical image segmentation. Furthermore, the research gaps and future directions in 3D medical image segmentation are discussed.
    Improving the Robustness of Federated Learning for Severely Imbalanced Datasets. (arXiv:2204.13414v1 [cs.LG])
    With the ever increasing data deluge and the success of deep neural networks, the research of distributed deep learning has become pronounced. Two common approaches to achieve this distributed learning is synchronous and asynchronous weight update. In this manuscript, we have explored very simplistic synchronous weight update mechanisms. It has been seen that with an increasing number of worker nodes, the performance degrades drastically. This effect has been studied in the context of extreme imbalanced classification (e.g. outlier detection). In practical cases, the assumed conditions of i.i.d. may not be fulfilled. There may also arise global class imbalance situations like that of outlier detection where the local servers receive severely imbalanced data and may not get any samples from the minority class. In that case, the DNNs in the local servers will get completely biased towards the majority class that they receive. This would highly impact the learning at the parameter server (which practically does not see any data). It has been observed that in a parallel setting if one uses the existing federated weight update mechanisms at the parameter server, the performance degrades drastically with the increasing number of worker nodes. This is mainly because, with the increasing number of nodes, there is a high chance that one worker node gets a very small portion of the data, either not enough to train the model without overfitting or having a highly imbalanced class distribution. The chapter, hence, proposes a workaround to this problem by introducing the concept of adaptive cost-sensitive momentum averaging. It is seen that for the proposed system, there was no to minimal degradation in performance while most of the other methods hit their bottom performance before that.
    Policy Gradient Approach to Compilation of Variational Quantum Circuits. (arXiv:2111.10227v2 [quant-ph] UPDATED)
    We propose a method for finding approximate compilations of quantum unitary transformations, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational gates. In this framework, finding the optimal configuration is done by optimizing over distribution parameters, rather than over free angles. We show numerically that this approach can be more competitive than gradient-free methods, for comparable amounts of resources (i.e. quantum circuit runs). Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity, which is an improvement over methods which rely on the Hilbert-Schmidt test. We expect these techniques to be relevant for training variational circuits in other contexts.
    Vitruvion: A Generative Model of Parametric CAD Sketches. (arXiv:2109.14124v2 [cs.LG] UPDATED)
    Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the construction of a constraint program, allowing edits to coherently propagate to other parts of the design. Machine learning offers the intriguing possibility of accelerating the design process via generative modeling of these structures, enabling new tools such as autocompletion, constraint inference, and conditional synthesis. In this work, we present such an approach to generative modeling of parametric CAD sketches, which constitute the basic computational building blocks of modern mechanical design. Our model, trained on real-world designs from the SketchGraphs dataset, autoregressively synthesizes sketches as sequences of primitives, with initial coordinates, and constraints that reference back to the sampled primitives. As samples from the model match the constraint graph representation used in standard CAD software, they may be directly imported, solved, and edited according to downstream design tasks. In addition, we condition the model on various contexts, including partial sketches (primers) and images of hand-drawn sketches. Evaluation of the proposed approach demonstrates its ability to synthesize realistic CAD sketches and its potential to aid the mechanical design workflow.
    Optimal Transport for Unsupervised Denoising Learning. (arXiv:2108.02574v4 [eess.IV] UPDATED)
    Recently, much progress has been made in unsupervised denoising learning. However, existing methods more or less rely on some assumptions on the signal and/or degradation model, which limits their practical performance. How to construct an optimal criterion for unsupervised denoising learning without any prior knowledge on the degradation model is still an open question. Toward answering this question, this work proposes a criterion for unsupervised denoising learning based on the optimal transport theory. This criterion has favorable properties, e.g., approximately maximal preservation of the information of the signal, whilst achieving perceptual reconstruction. Furthermore, though a relaxed unconstrained formulation is used in practical implementation, we prove that the relaxed formulation in theory has the same solution as the original constrained formulation. Experiments on synthetic and real-world data, including realistic photographic, microscopy, depth, and raw depth images, demonstrate that the proposed method even compares favorably with supervised methods, e.g., approaching the PSNR of supervised methods while having better perceptual quality. Particularly, for spatially correlated noise and realistic microscopy images, the proposed method not only achieves better perceptual quality but also has higher PSNR than supervised methods. Besides, it shows remarkable superiority in harsh practical conditions with complex noise, e.g., raw depth images. Code is available at https://github.com/wangweiSJTU/OTUR.
    Deepfake Forensics via An Adversarial Game. (arXiv:2103.13567v2 [cs.CV] UPDATED)
    With the progress in AI-based facial forgery (i.e., deepfake), people are increasingly concerned about its abuse. Albeit effort has been made for training classification (also known as deepfake detection) models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these specific artifacts, by introducing pixel-wise Gaussian blurring models. With adversarial training, the classification models are forced to learn more discriminative and generalizable features, and the effectiveness of our method can be verified by plenty of empirical evidence. Our code will be made publicly available.
    Multi Type Mean Field Reinforcement Learning. (arXiv:2002.02513v6 [cs.MA] UPDATED)
    Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field reinforcement learning, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field environments: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
    KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients. (arXiv:2204.13683v1 [cs.RO])
    Simulators offer the possibility of safe, low-cost development of self-driving systems. However, current driving simulators exhibit na\"ive behavior models for background traffic. Hand-tuned scenarios are typically added during simulation to induce safety-critical situations. An alternative approach is to adversarially perturb the background traffic trajectories. In this paper, we study this approach to safety-critical driving scenario generation using the CARLA simulator. We use a kinematic bicycle model as a proxy to the simulator's true dynamics and observe that gradients through this proxy model are sufficient for optimizing the background traffic trajectories. Based on this finding, we propose KING, which generates safety-critical driving scenarios with a 20% higher success rate than black-box optimization. By solving the scenarios generated by KING using a privileged rule-based expert algorithm, we obtain training data for an imitation learning policy. After fine-tuning on this new data, we show that the policy becomes better at avoiding collisions. Importantly, our generated data leads to reduced collisions on both held-out scenarios generated via KING as well as traditional hand-crafted scenarios, demonstrating improved robustness.
    Fuzzy Cognitive Maps and Hidden Markov Models: Comparative Analysis of Efficiency within the Confines of the Time Series Classification Task. (arXiv:2204.13455v1 [cs.LG])
    Time series classification is one of the very popular machine learning tasks. In this paper, we explore the application of Hidden Markov Model (HMM) for time series classification. We distinguish between two modes of HMM application. The first, in which a single model is built for each class. The second, in which one HMM is built for each time series. We then transfer both approaches for classifier construction to the domain of Fuzzy Cognitive Maps. The identified four models, HMM NN (HMM, one per series), HMM 1C (HMM, one per class), FCM NN, and FCM 1C are then studied in a series of experiments. We compare the performance of different models and investigate the impact of their hyperparameters on the time series classification accuracy. The empirical evaluation shows a clear advantage of the one-model-per-series approach. The results show that the choice between HMM and FCM should be dataset-dependent.
    Autoencoder based Hybrid Multi-Task Predictor Network for Daily Open-High-Low-Close Prices Prediction of Indian Stocks. (arXiv:2204.13422v1 [cs.LG])
    Stock prices are highly volatile and sudden changes in trends are often very problematic for traditional forecasting models to handle. The standard Long Short Term Memory (LSTM) networks are regarded as the state-of-the-art models for such predictions. But, these models fail to handle sudden and drastic changes in the price trend. Moreover, there are some inherent constraints with the open, high, low and close (OHLC) prices of the stocks. Literature lacks the study on the inherent property of OHLC prices. We argue that predicting the OHLC prices for the next day is much more informative than predicting the trends of the stocks as the trend is mostly calculated using these OHLC prices only. The problem mainly is focused on Buy-Today Sell-Tomorrow (BTST) trading. In this regard, AEs when pre-trained with the stock prices, may be beneficial. A novel framework is proposed where a pre-trained encoder is cascaded in front of the multi-task predictor network. This hybrid network can leverage the power of a combination of networks and can both handle the OHLC constraints as well as capture any sudden drastic changes in the prices. It is seen that such a network is much more efficient at predicting stock prices. The experiments have been extended to recommend the most profitable and most overbought stocks on the next day. The model has been tested for multiple Indian companies and it is found that the recommendations from the proposed model have not resulted in a single loss for a test period of 300 days.
    Cumulative Stay-time Representation for Electronic Health Records in Medical Event Time Prediction. (arXiv:2204.13451v1 [cs.LG])
    We address the problem of predicting when a disease will develop, i.e., medical event time (MET), from a patient's electronic health record (EHR). The MET of non-communicable diseases like diabetes is highly correlated to cumulative health conditions, more specifically, how much time the patient spent with specific health conditions in the past. The common time-series representation is indirect in extracting such information from EHR because it focuses on detailed dependencies between values in successive observations, not cumulative information. We propose a novel data representation for EHR called cumulative stay-time representation (CTR), which directly models such cumulative health conditions. We derive a trainable construction of CTR based on neural networks that has the flexibility to fit the target data and scalability to handle high-dimensional EHR. Numerical experiments using synthetic and real-world datasets demonstrate that CTR alone achieves a high prediction performance, and it enhances the performance of existing models when combined with them.
    Reusability and Transferability of Macro Actions for Reinforcement Learning. (arXiv:1908.01478v3 [cs.NE] UPDATED)
    Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this paper, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action generated along with one RL method can be reused by another RL method for training, while the second one, transferability, means that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first generate macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the generated macro actions.
    A unified theory of information transfer and causal relation. (arXiv:2204.13598v1 [cond-mat.stat-mech])
    Information transfer between coupled stochastic dynamics, measured by transfer entropy and information flow, is suggested as a physical process underlying the causal relation of systems. While information transfer analysis has booming applications in both science and engineering fields, critical mysteries about its foundations remain unsolved. Fundamental yet difficult questions concern how information transfer and causal relation originate, what they depend on, how they differ from each other, and if they are created by a unified and general quantity. These questions essentially determine the validity of causal relation measurement via information transfer. Here we pursue to lay a complete theoretical basis of information transfer and causal relation. Beyond the well-known relations between these concepts that conditionally hold, we demonstrate that information transfer and causal relation universally originate from specific information synergy and redundancy phenomena characterized by high-order mutual information. More importantly, our theory analytically explains the mechanisms for information transfer and causal relation to originate, vanish, and differ from each other. Moreover, our theory naturally defines the effect sizes of information transfer and causal relation based on high-dimensional coupling events. These results may provide a unified view of information, synergy, and causal relation to bridge Pearl's causal inference theory in computer science and information transfer analysis in physics.
    Unaligned Supervision For Automatic Music Transcription in The Wild. (arXiv:2204.13668v1 [cs.SD])
    Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical recording into semantic musical content, is one of the holy grails of Music Information Retrieval. Current AMT approaches are restricted to piano and (some) guitar recordings, due to difficult data collection. In order to overcome data collection barriers, previous AMT approaches attempt to employ musical scores in the form of a digitized version of the same song or piece. The scores are typically aligned using audio features and strenuous human intervention to generate training labels. We introduce NoteEM, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. Using this unaligned supervision scheme, complemented by pseudo-labels and pitch-shift augmentation, our method enables training on in-the-wild recordings with unprecedented accuracy and instrumental variety. Using only synthetic data and unaligned supervision, we report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations. We also demonstrate robustness and ease of use; we report comparable results when training on a small, easily obtainable, self-collected dataset, and we propose alternative labeling to the MusicNet dataset, which we show to be more accurate. Our project page is available at https://benadar293.github.io
    Process-BERT: A Framework for Representation Learning on Educational Process Data. (arXiv:2204.13607v1 [cs.LG])
    Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation's report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
    Bona fide Riesz projections for density estimation. (arXiv:2204.13606v1 [eess.SP])
    The projection of sample measurements onto a reconstruction space represented by a basis on a regular grid is a powerful and simple approach to estimate a probability density function. In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass $1$. Our bona fide projection is defined as a convex problem. We propose solution techniques and evaluate them. Results suggest an improved performance, specifically in circumstances prone to rippling effects.
    Foundations for learning from noisy quantum experiments. (arXiv:2204.13691v1 [quant-ph])
    Understanding what can be learned from experiments is central to scientific progress. In this work, we use a learning-theoretic perspective to study the task of learning physical operations in a quantum machine when all operations (state preparation, dynamics, and measurement) are a priori unknown. We prove that, without any prior knowledge, if one can explore the full quantum state space by composing the operations, then every operation can be learned. When one cannot explore the full state space but all operations are approximately known and noise in Clifford gates is gate-independent, we find an efficient algorithm for learning all operations up to a single unlearnable parameter characterizing the fidelity of the initial state. For learning a noise channel on Clifford gates to a fixed accuracy, our algorithm uses quadratically fewer experiments than previously known protocols. Under more general conditions, the true description of the noise can be unlearnable; for example, we prove that no benchmarking protocol can learn gate-dependent Pauli noise on Clifford+T gates even under perfect state preparation and measurement. Despite not being able to learn the noise, we show that a noisy quantum computer that performs entangled measurements on multiple copies of an unknown state can yield a large advantage in learning properties of the state compared to a noiseless device that measures individual copies and then processes the measurement data using a classical computer. Concretely, we prove that noisy quantum computers with two-qubit gate error rate $\epsilon$ can achieve a learning task using $N$ copies of the state, while $N^{\Omega(1/\epsilon)}$ copies are required classically.
    Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods. (arXiv:2204.13372v1 [cs.LG])
    Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.
    It's DONE: Direct ONE-shot learning without training optimization. (arXiv:2204.13361v1 [cs.LG])
    Learning a new concept from one example is a superior function of human brain and it is drawing attention in the field of machine learning as one-shot learning task. In this paper, we propose the simplest method for this task, named Direct ONE-shot learning (DONE). DONE adds a new class to a pretrained deep neural network (DNN) classifier with neither training optimization nor other-classes modification. DONE is inspired by Hebbian theory and directly uses the neural activity input of the final dense layer obtained from a data that belongs to the new additional class as the connectivity weight (synaptic strength) with a newly-provided-output neuron for the new class. DONE requires just one inference for obtaining the output of the final dense layer and its procedure is simple, deterministic, not requiring parameter tuning and hyperparameters. The performance of DONE depends entirely on the pretrained DNN model used as a backbone model, and we confirmed that DONE with a well-trained backbone model performs a practical-level accuracy. DONE has some advantages including a DNN's practical use that is difficult to spend high cost for a training, an evaluation of existing DNN models, and the understanding of the brain. DONE might be telling us one-shot learning is an easy task that can be achieved by a simple principle not only for humans but also for current well-trained DNN models.
    List-Mode PET Image Reconstruction Using Deep Image Prior. (arXiv:2204.13404v1 [physics.med-ph])
    List-mode positron emission tomography (PET) image reconstruction is an important tool for PET scanners with many lines-of-response (LORs) and additional information such as time-of-flight and depth-of-interaction. Deep learning is one possible solution to enhance the quality of PET image reconstruction. However, the application of deep learning techniques to list-mode PET image reconstruction have not been progressed because list data is a sequence of bit codes and unsuitable for processing by convolutional neural networks (CNN). In this study, we propose a novel list-mode PET image reconstruction method using an unsupervised CNN called deep image prior (DIP) and a framework of alternating direction method of multipliers. The proposed list-mode DIP reconstruction (LM-DIPRecon) method alternatively iterates regularized list-mode dynamic row action maximum likelihood algorithm (LM-DRAMA) and magnetic resonance imaging conditioned DIP (MR-DIP). We evaluated LM-DIPRecon using both simulation and clinical data, and it achieved sharper images and better tradeoff curves between contrast and noise than the LM-DRAMA and MR-DIP. These results indicated that the LM-DIPRecon is useful for quantitative PET imaging with limited events. In addition, as list data has finer temporal information than dynamic sinograms, list-mode deep image prior reconstruction is expected to be useful for 4D PET imaging and motion correction.
    Poisoning Deep Learning based Recommender Model in Federated Learning Scenarios. (arXiv:2204.13594v1 [cs.IR])
    Various attack methods against recommender systems have been proposed in the past years, and the security issues of recommender systems have drawn considerable attention. Traditional attacks attempt to make target items recommended to as many users as possible by poisoning the training data. Benifiting from the feature of protecting users' private data, federated recommendation can effectively defend such attacks. Therefore, quite a few works have devoted themselves to developing federated recommender systems. For proving current federated recommendation is still vulnerable, in this work we probe to design attack approaches targeting deep learning based recommender models in federated learning scenarios. Specifically, our attacks generate poisoned gradients for manipulated malicious users to upload based on two strategies (i.e., random approximation and hard user mining). Extensive experiments show that our well-designed attacks can effectively poison the target models, and the attack effectiveness sets the state-of-the-art.
    Model Selection, Adaptation, and Combination for Deep Transfer Learning through Neural Networks in Renewable Energies. (arXiv:2204.13293v1 [cs.LG])
    There is recent interest in using model hubs, a collection of pre-trained models, in computer vision tasks. To utilize the model hub, we first select a source model and then adapt the model for the target to compensate for differences. While there is yet limited research on a model selection and adaption for computer vision tasks, this holds even more for the field of renewable power. At the same time, it is a crucial challenge to provide forecasts for the increasing demand for power forecasts based on weather features from a numerical weather prediction. We close these gaps by conducting the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast, adopting recent results from the field of computer vision on six datasets. We adopt models based on data from different seasons and limit the amount of training data. As an extension of the current state of the art, we utilize a Bayesian linear regression for forecasting the response based on features extracted from a neural network. This approach outperforms the baseline with only seven days of training data. We further show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach. In fact, with more than 30 days of training data, both proposed model combination techniques achieve similar results to those models trained with a full year of training data.
    On Parametric Optimal Execution and Machine Learning Surrogates. (arXiv:2204.08581v2 [q-fin.TR] UPDATED)
    We investigate optimal order execution problems in discrete time with instantaneous price impact and stochastic resilience. First, in the setting of linear transient price impact we derive a closed-form recursion for the optimal strategy, extending the deterministic results from Obizhaeva and Wang (J Financial Markets, 2013). Second, we develop a numerical algorithm based on dynamic programming and deep learning for the case of nonlinear transient price impact as proposed by Bouchaud et al. (Quant. Finance, 2004). Specifically, we utilize an actor-critic framework that constructs two neural-network (NN) surrogates for the value function and the feedback control. The flexible scalability of NN functional approximators enables parametric learning, i.e., incorporating several model or market parameters as part of the input space. Precise calibration of price impact, resilience, etc., is known to be extremely challenging and hence it is critical to understand sensitivity of the execution policy to these parameters. Our NN learner organically scales across multiple input dimensions and is shown to accurately approximate optimal strategies across a wide range of parameter configurations. We provide a fully reproducible Jupyter Notebook with our NN implementation, which is of independent pedagogical interest, demonstrating the ease of use of NN surrogates in (parametric) stochastic control problems.
    Predicting Sleeping Quality using Convolutional Neural Networks. (arXiv:2204.13584v1 [eess.SP])
    Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to sleeping patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from different methods, including traditional machine learning methods such as Logistic Regression (LR), Decision Trees (DT), k-Nearest Neighbour (k-NN), Naive Bayes (NB) and Support Vector Machine (SVM), on 3 publicly available sleep datasets. The accuracy, sensitivity, specificity, precision, recall, and F-score are reported and will serve as a baseline to simulate the research in this direction in the future.
    Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss. (arXiv:2204.13437v1 [cs.SD])
    Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training issues and at the same time produce monotonic alignments. Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By properly adjusting this regularization term we show that the loss curves become smoother, and at the same time Regotron consistently produces monotonic alignments in unseen examples even at an early stage (13\% of the total number of epochs) of its training process, whereas the fully converged Tacotron2 fails to do so. Moreover, our proposed regularization method has no additional computational overhead, while reducing common TTS mistakes and achieving slighlty improved speech naturalness according to subjective mean opinion scores (MOS) collected from 50 evaluators.
    Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications. (arXiv:2204.13502v1 [cs.LG])
    Multi-player multi-armed bandits (MMAB) study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards, both of which are usually too restrictive in practical scenarios. In this paper, we propose an MMAB with shareable resources as an extension to the collision and non-collision settings. Each shareable arm has finite shareable resources and a "per-load" reward random variable, both of which are unknown to players. The reward from a shareable arm is equal to the "per-load" reward multiplied by the minimum between the number of players pulling the arm and the arm's maximal shareable resources. We consider two types of feedback: sharing demand information (SDI) and sharing demand awareness (SDA), each of which provides different signals of resource sharing. We design the DPE-SDI and SIC-SDA algorithms to address the shareable arm problem under these two cases of feedback respectively and prove that both algorithms have logarithmic regrets that are tight in the number of rounds. We conduct simulations to validate both algorithms' performance and show their utilities in wireless networking and edge computing.
    Predicting batch queue job wait times for informed scheduling of urgent HPC workloads. (arXiv:2204.13543v1 [cs.DC])
    There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.
    BI-GreenNet: Learning Green's functions by boundary integral network. (arXiv:2204.13247v1 [cs.LG])
    Green's function plays a significant role in both theoretical analysis and numerical computing of partial differential equations (PDEs). However, in most cases, Green's function is difficult to compute. The troubles arise in the following three folds. Firstly, compared with the original PDE, the dimension of Green's function is doubled, making it impossible to be handled by traditional mesh-based methods. Secondly, Green's function usually contains singularities which increase the difficulty to get a good approximation. Lastly, the computational domain may be very complex or even unbounded. To override these problems, we leverage the fundamental solution, boundary integral method and neural networks to develop a new method for computing Green's function with high accuracy in this paper. We focus on Green's function of Poisson and Helmholtz equations in bounded domains, unbounded domains. We also consider Poisson equation and Helmholtz domains with interfaces. Extensive numerical experiments illustrate the efficiency and the accuracy of our method for solving Green's function. In addition, we also use the Green's function calculated by our method to solve a class of PDE, and also obtain high-precision solutions, which shows the good generalization ability of our method on solving PDEs.
    Deep graph matching meets mixed-integer linear programming: Relax at your own risk ?. (arXiv:2108.00394v5 [cs.CV] UPDATED)
    Graph matching is an important problem that has received widespread attention, especially in the field of computer vision. Recently, state-of-the-art methods seek to incorporate graph matching with deep learning. However, there is no research to explain what role the graph matching algorithm plays in the model. Therefore, we propose an approach integrating a MILP formulation of the graph matching problem. This formulation is solved to optimal and it provides inherent baseline. Meanwhile, similar approaches are derived by releasing the optimal guarantee of the graph matching solver and by introducing a quality level. This quality level controls the quality of the solutions provided by the graph matching solver. In addition, several relaxations of the graph matching problem are put to the test. Our experimental evaluation gives several theoretical insights and guides the direction of deep graph matching methods.
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.
    Exploring How Anomalous Model Input and Output Alerts Affect Decision-Making in Healthcare. (arXiv:2204.13194v1 [cs.HC])
    An important goal in the field of human-AI interaction is to help users more appropriately trust AI systems' decisions. A situation in which the user may particularly benefit from more appropriate trust is when the AI receives anomalous input or provides anomalous output. To the best of our knowledge, this is the first work towards understanding how anomaly alerts may contribute to appropriate trust of AI. In a formative mixed-methods study with 4 radiologists and 4 other physicians, we explore how AI alerts for anomalous input, very high and low confidence, and anomalous saliency-map explanations affect users' experience with mockups of an AI clinical decision support system (CDSS) for evaluating chest x-rays for pneumonia. We find evidence suggesting that the four anomaly alerts are desired by non-radiologists, and the high-confidence alerts are desired by both radiologists and non-radiologists. In a follow-up user study, we investigate how high- and low-confidence alerts affect the accuracy and thus appropriate trust of 33 radiologists working with AI CDSS mockups. We observe that these alerts do not improve users' accuracy or experience and discuss potential reasons why.
    Machine Learning for Violence Risk Assessment Using Dutch Clinical Notes. (arXiv:2204.13535v1 [cs.LG])
    Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records are valuable resources capturing unique information, but are seldom used to their full potential. We explore conventional and deep machine learning methods to assess violence risk in psychiatric patients using practitioner notes. The performance of our best models is comparable to the currently used questionnaire-based method, with an area under the Receiver Operating Characteristic curve of approximately 0.8. We find that the deep-learning model BERTje performs worse than conventional machine learning methods. We also evaluate our data and our classifiers to understand the performance of our models better. This is particularly important for the applicability of evaluated classifiers to new data, and is also of great interest to practitioners, due to the increased availability of new data in electronic format.
    Learning Storm Surge with Gradient Boosting. (arXiv:2204.13168v1 [cs.CE])
    Storm surge is a major natural hazard for coastal regions, responsible both for significant property damage and loss of life. Accurate, efficient models of storm surge are needed both to assess long-term risk and to guide emergency management decisions. While high-fidelity ocean circulation models such as the ADvanced CIRCulation (ADCIRC) model can accurately predict storm surge, they are very computationally expensive. Consequently, there have been a number of efforts in recent years to develop data-driven surrogate models for storm surge. While these models can attain good accuracy and are highly efficient, they are often limited to a small geographical region and a fixed set of output locations. We develop a novel surrogate model for peak storm surge prediction based on gradient boosting. Unlike most surrogate approaches, our model is not explicitly constrained to a fixed set of output locations or specific geographical region. The model is trained with a database of 446 synthetic storms that make landfall on the Texas coast and obtains a mean absolute error of 0.25 meters. We additionally present a test of the model on Hurricanes Ike (2008) and Harvey (2017).
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
    Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity. (arXiv:2204.10872v2 [cs.LG] UPDATED)
    Plackett-Luce gradient estimation enables the optimization of stochastic ranking models within feasible time constraints through sampling techniques. Unfortunately, the computational complexity of existing methods does not scale well with the length of the rankings, i.e. the ranking cutoff, nor with the item collection size. In this paper, we introduce the novel PL-Rank-3 algorithm that performs unbiased gradient estimation with a computational complexity comparable to the best sorting algorithms. As a result, our novel learning-to-rank method is applicable in any scenario where standard sorting is feasible in reasonable time. Our experimental results indicate large gains in the time required for optimization, without any loss in performance. For the field, our contribution could potentially allow state-of-the-art learning-to-rank methods to be applied to much larger scales than previously feasible.
    Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry. (arXiv:2204.13497v1 [cs.CV])
    Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering.
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.
    Machine learning for knowledge acquisition and accelerated inverse-design for non-Hermitian systems. (arXiv:2204.13376v1 [physics.optics])
    Non-Hermitian systems offer new platforms for unusual physical properties that can be flexibly manipulated by redistribution of the real and imaginary parts of refractive indices, whose presence breaks conventional wave propagation symmetries, leading to asymmetric reflection and symmetric transmission with respect to the wave propagation direction. Here, we use supervised and unsupervised learning techniques for knowledge acquisition in non-Hermitian systems which accelerate the inverse design process. In particular, we construct a deep learning model that relates the transmission and asymmetric reflection in non-conservative settings and proposes sub-manifold learning to recognize non-Hermitian features from transmission spectra. The developed deep learning framework determines the feasibility of a desired spectral response for a given structure and uncovers the role of effective gain-loss parameters to tailor the spectral response. These findings pave the way for intelligent inverse design and shape our understanding of the physical mechanism in general non-Hermitian systems.
    MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios. (arXiv:2112.13753v5 [cs.LG] UPDATED)
    Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.
    Music Enhancement via Image Translation and Vocoding. (arXiv:2204.13289v1 [cs.SD])
    Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ. This paper presents a deep learning approach to enhance low-quality music recordings by combining (i) an image-to-image translation model for manipulating audio in its mel-spectrogram representation and (ii) a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. We find that this approach to music enhancement outperforms baselines which use classical methods for mel-spectrogram inversion and an end-to-end approach directly mapping noisy waveforms to clean waveforms. Additionally, in evaluating the proposed method with a listening test, we analyze the reliability of common audio enhancement evaluation metrics when used in the music domain.
    Control-Aware Prediction Objectives for Autonomous Driving. (arXiv:2204.13319v1 [cs.LG])
    Autonomous vehicle software is typically structured as a modular pipeline of individual components (e.g., perception, prediction, and planning) to help separate concerns into interpretable sub-tasks. Even when end-to-end training is possible, each module has its own set of objectives used for safety assurance, sample efficiency, regularization, or interpretability. However, intermediate objectives do not always align with overall system performance. For example, optimizing the likelihood of a trajectory prediction module might focus more on easy-to-predict agents than safety-critical or rare behaviors (e.g., jaywalking). In this paper, we present control-aware prediction objectives (CAPOs), to evaluate the downstream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories. Experimentally, we show our objectives improve overall system performance in suburban driving scenarios using the CARLA simulator.
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.
    UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of Perturbed Samples. (arXiv:2204.09502v2 [cs.CR] UPDATED)
    A rising number of botnet families have been successfully detected using deep learning architectures. While the variety of attacks increases, these architectures should become more robust against attacks. They have been proven to be very sensitive to small but well constructed perturbations in the input. Botnet detection requires extremely low false-positive rates (FPR), which are not commonly attainable in contemporary deep learning. Attackers try to increase the FPRs by making poisoned samples. The majority of recent research has focused on the use of model loss functions to build adversarial examples and robust models. In this paper, two LSTM-based classification algorithms for botnet classification with an accuracy higher than 98% are presented. Then, the adversarial attack is proposed, which reduces the accuracy to about 30%. Then, by examining the methods for computing the uncertainty, the defense method is proposed to increase the accuracy to about 70%. By using the deep ensemble and stochastic weight averaging quantification methods it has been investigated the uncertainty of the accuracy in the proposed methods.
    Actor-Critic Scheduling for Path-Aware Air-to-Ground Multipath Multimedia Delivery. (arXiv:2204.13343v1 [cs.NI])
    Reinforcement Learning (RL) has recently found wide applications in network traffic management and control because some of its variants do not require prior knowledge of network models. In this paper, we present a novel scheduler for real-time multimedia delivery in multipath systems based on an Actor-Critic (AC) RL algorithm. We focus on a challenging scenario of real-time video streaming from an Unmanned Aerial Vehicle (UAV) using multiple wireless paths. The scheduler acting as an RL agent learns in real-time the optimal policy for path selection, path rate allocation and redundancy estimation for flow protection. The scheduler, implemented as a module of the GStreamer framework, can be used in real or simulated settings. The simulation results show that our scheduler can target a very low loss rate at the receiver by dynamically adapting in real-time the scheduling policy to the path conditions without performing training or relying on prior knowledge of network channel models.
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.
    Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation. (arXiv:2204.13170v1 [cs.LG])
    In Federated Learning a number of clients collaborate to train a model without sharing their data. Client models are optimized locally and are communicated through a central hub called server. A major challenge is to deal with heterogeneity among clients' data which causes the local optimization to drift away with respect to the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into Federated Learning optimization recently. However, the existing solutions propagate the error of their estimations, throughout the optimization trajectory which leads to inaccurate approximations of the clients' drift and ultimately failure to remove them properly. In this paper, we address this issue by introducing an adaptive algorithm that efficiently reduces clients' drift. Compared to the previous works on adapting variance reduction to Federated Learning, our approach uses less or the same level of communication bandwidth, computation or memory. Additionally, it addresses the instability problem--prevalent in prior work, caused by increasing norm of the estimates which makes our approach a much more practical solution for large scale Federated Learning settings. Our experimental results demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy compared to the baselines in an extensive set of Federated Learning benchmarks.
    Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature Extractor. (arXiv:2204.13349v1 [cs.LG])
    Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
    AutoLossGen: Automatic Loss Function Generation for Recommender Systems. (arXiv:2204.13160v1 [cs.IR])
    In recommendation systems, the choice of loss function is critical since a good loss may significantly improve the model performance. However, manually designing a good loss is a big challenge due to the complexity of the problem. A large fraction of previous work focuses on handcrafted loss functions, which needs significant expertise and human effort. In this paper, inspired by the recent development of automated machine learning, we propose an automatic loss function generation framework, AutoLossGen, which is able to generate loss functions directly constructed from basic mathematical operators without prior knowledge on loss structure. More specifically, we develop a controller model driven by reinforcement learning to generate loss functions, and develop iterative and alternating optimization schedule to update the parameters of both the controller model and the recommender model. One challenge for automatic loss generation in recommender systems is the extreme sparsity of recommendation datasets, which leads to the sparse reward problem for loss generation and search. To solve the problem, we further develop a reward filtering mechanism for efficient and effective loss generation. Experimental results show that our framework manages to create tailored loss functions for different recommendation models and datasets, and the generated loss gives better recommendation performance than commonly used baseline losses. Besides, most of the generated losses are transferable, i.e., the loss generated based on one model and dataset also works well for another model or dataset. Source code of the work is available at https://github.com/rutgerswiselab/AutoLossGen.
    SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning. (arXiv:2203.13060v2 [cs.IT] UPDATED)
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$, and per-user communication load of up to $(1+\frac{T+D}{K})L$, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.
    Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks. (arXiv:2103.15938v2 [eess.SY] UPDATED)
    We propose a policy search approach to learn controllers from specifications given as Signal Temporal Logic (STL) formulae. The system model, which is unknown but assumed to be an affine control system, is learned together with the control policy. The model is implemented as two feedforward neural networks (FNNs) - one for the drift, and one for the control directions. To capture the history dependency of STL specifications, we use a recurrent neural network (RNN) to implement the control policy. In contrast to prevalent model-free methods, the learning approach proposed here takes advantage of the learned model and is more efficient. We use control barrier functions (CBFs) with the learned model to improve the safety of the system. We validate our algorithm via simulations and experiments. The results show that our approach can satisfy the given specification within very few system runs, and can be used for on-line control.
    Compositional Federated Learning for Distributionally Robust and Meta Learning. (arXiv:2106.11264v2 [cs.LG] UPDATED)
    In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.
    ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution. (arXiv:2101.07415v5 [cs.LG] UPDATED)
    In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and empirically, which thus limits their scope over hybrid search spaces as well. In order to combat this curse, we propose ES-ENAS, a simple and modular joint optimization procedure combining the class of sample-efficient smoothed gradient gradient techniques, commonly known as Evolutionary Strategies (ES), with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Efficient Neural Architecture Search (ENAS). By doing so, we achieve significantly more sample efficiency, which we empirically demonstrate over synthetic benchmarks, and are further able to apply ES-ENAS for architecture search over popular RL benchmarks.
    Classifier Calibration: with application to threat scores in cybersecurity. (arXiv:2102.05143v3 [cs.LG] UPDATED)
    This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing expanding the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator~\citep{Platt1999ProbabilisticOutputsForSupport}, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, expanding the scores can help in some experiments.
    Consistent Relative Confidence and Label-Free Model Selection for Convolutional Neural Networks. (arXiv:2108.11845v7 [cs.CV] UPDATED)
    This letter is concerned with image classification with deep convolutional neural networks (CNNs). The focus is on the following question: given a set of candidate CNN models, how to select the right one with the best generalization property for the current task? Present model selection methods require access to a batch of labeled data for computing a pre-specified performance metric, such as the cross-entropy loss, the classification error rate, the negative log-likelihood. In many practical cases, labels are not available in time as labeling itself is a time-consuming and expensive task. To this end, this letter presents an approach to CNN model selection using only unlabeled data. This method is developed based on a principle termed consistent relative confidence. The effectiveness and efficiency of the proposed method are demonstrated by experiments using benchmark datasets.
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.
    Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms. (arXiv:2204.13590v1 [cs.CV])
    Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisition, including camera(s), laser scanners, and Microsoft Kinect. Afterward, it thoroughly and comprehensively reviews the SoTA computer vision algorithms, including (1) classical 2-D image processing, (2) 3-D point cloud modeling and segmentation, and (3) machine/deep learning, developed for road pothole detection. This article also discusses the existing challenges and future development trends of computer vision-based road pothole detection approaches: classical 2-D image processing-based and 3-D point cloud modeling and segmentation-based approaches have already become history; and Convolutional neural networks (CNNs) have demonstrated compelling road pothole detection results and are promising to break the bottleneck with the future advances in self/un-supervised learning for multi-modal semantic segmentation. We believe that this survey can serve as practical guidance for developing the next-generation road condition assessment systems.
    Russian Texts Detoxification with Levenshtein Editing. (arXiv:2204.13638v1 [cs.CL])
    Text detoxification is a style transfer task of creating neutral versions of toxic texts. In this paper, we use the concept of text editing to build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. With this model, we achieved the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
    Nonbacktracking spectral clustering of nonuniform hypergraphs. (arXiv:2204.13586v1 [cs.SI])
    Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type to enable faster computation of eigenpairs. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation, offering proofs that both formalize and extend several previous results. We perform experiments in real and synthetic data that underscore the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure. Through an analysis of our algorithm, we pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels writ large.
    PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings. (arXiv:2204.13597v1 [eess.SP])
    Generative models such as the variational autoencoder (VAE) and the generative adversarial networks (GAN) have proven to be incredibly powerful for the generation of synthetic data that preserves statistical properties and utility of real-world datasets, especially in the context of image and natural language text. Nevertheless, until now, there has no successful demonstration of how to apply either method for generating useful physiological sensory data. The state-of-the-art techniques in this context have achieved only limited success. We present PHYSIOGAN, a generative model to produce high fidelity synthetic physiological sensor data readings. PHYSIOGAN consists of an encoder, decoder, and a discriminator. We evaluate PHYSIOGAN against the state-of-the-art techniques using two different real-world datasets: ECG classification and activity recognition from motion sensors datasets. We compare PHYSIOGAN to the baseline models not only the accuracy of class conditional generation but also the sample diversity and sample novelty of the synthetic datasets. We prove that PHYSIOGAN generates samples with higher utility than other generative models by showing that classification models trained on only synthetic data generated by PHYSIOGAN have only 10% and 20% decrease in their classification accuracy relative to classification models trained on the real data. Furthermore, we demonstrate the use of PHYSIOGAN for sensor data imputation in creating plausible results.
    Personalized Federated Learning with Multiple Known Clusters. (arXiv:2204.13619v1 [cs.LG])
    We consider the problem of personalized federated learning when there are known cluster structures within users. An intuitive approach would be to regularize the parameters so that users in the same cluster share similar model weights. The distances between the clusters can then be regularized to reflect the similarity between different clusters of users. We develop an algorithm that allows each cluster to communicate independently and derive the convergence results. We study a hierarchical linear model to theoretically demonstrate that our approach outperforms agents learning independently and agents learning a single shared weight. Finally, we demonstrate the advantages of our approach using both simulated and real-world data.
    Zero-Shot Logit Adjustment. (arXiv:2204.11822v2 [cs.CV] UPDATED)
    Semantic-descriptor-based Generalized Zero-Shot Learning (GZSL) poses challenges in recognizing novel classes in the test phase. The development of generative models enables current GZSL techniques to probe further into the semantic-visual link, culminating in a two-stage form that includes a generator and a classifier. However, existing generation-based methods focus on enhancing the generator's effect while neglecting the improvement of the classifier. In this paper, we first analyze of two properties of the generated pseudo unseen samples: bias and homogeneity. Then, we perform variational Bayesian inference to back-derive the evaluation metrics, which reflects the balance of the seen and unseen classes. As a consequence of our derivation, the aforementioned two properties are incorporated into the classifier training as seen-unseen priors via logit adjustment. The Zero-Shot Logit Adjustment further puts semantic-based classifiers into effect in generation-based GZSL. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks. Our codes are available on https://github.com/cdb342/IJCAI-2022-ZLA.
    Continual Learning for Peer-to-Peer Federated Learning: A Study on Automated Brain Metastasis Identification. (arXiv:2204.13591v1 [cs.LG])
    Due to data privacy constraints, data sharing among multiple centers is restricted. Continual learning, as one approach to peer-to-peer federated learning, can promote multicenter collaboration on deep learning algorithm development by sharing intermediate models instead of training data. This work aims to investigate the feasibility of continual learning for multicenter collaboration on an exemplary application of brain metastasis identification using DeepMedic. 920 T1 MRI contrast enhanced volumes are split to simulate multicenter collaboration scenarios. A continual learning algorithm, synaptic intelligence (SI), is applied to preserve important model weights for training one center after another. In a bilateral collaboration scenario, continual learning with SI achieves a sensitivity of 0.917, and naive continual learning without SI achieves a sensitivity of 0.906, while two models trained on internal data solely without continual learning achieve sensitivity of 0.853 and 0.831 only. In a seven-center multilateral collaboration scenario, the models trained on internal datasets (100 volumes each center) without continual learning obtain a mean sensitivity value of 0.725. With single-visit continual learning (i.e., the shared model visits each center only once during training), the sensitivity is improved to 0.788 and 0.849 without SI and with SI, respectively. With iterative continual learning (i.e., the shared model revisits each center multiple times during training), the sensitivity is further improved to 0.914, which is identical to the sensitivity using mixed data for training. Our experiments demonstrate that continual learning can improve brain metastasis identification performance for centers with limited data. This study demonstrates the feasibility of applying continual learning for peer-to-peer federated learning in multicenter collaboration.
    On tuning a mean-field model for semi-supervised classification. (arXiv:2204.13519v1 [cs.LG])
    Semi-supervised learning (SSL) has become an interesting research area due to its capacity for learning in scenarios where both labeled and unlabeled data are available. In this work, we focus on the task of transduction - when the objective is to label all data presented to the learner - with a mean-field approximation to the Potts model. Aiming at this particular task we study how classification results depend on $\beta$ and find that the optimal phase depends highly on the amount of labeled data available. In the same study, we also observe that more stable classifications regarding small fluctuations in $\beta$ are related to configurations of high probability and propose a tuning approach based on such observation. This method relies on a novel parameter $\gamma$ and we then evaluate two different values of the said quantity in comparison with classical methods in the field. This evaluation is conducted by changing the amount of labeled data available and the number of nearest neighbors in the similarity graph. Empirical results show that the tuning method is effective and allows NMF to outperform other approaches in datasets with fewer classes. In addition, one of the chosen values for $\gamma$ also leads to results that are more resilient to changes in the number of neighbors, which might be of interest to practitioners in the field of SSL.
    EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification. (arXiv:2204.13496v1 [cs.CL])
    Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services. Such systems should be able to enrol (E), verify (V), and identify (I) new and recurring users based on their personal information, e.g. postcode, name, and date of birth. In this work, we formalise the three authentication tasks and their evaluation protocols, and we present EVI, a challenging spoken multilingual dataset with 5,506 dialogues in English, Polish, and French. Our proposed models set the first competitive benchmarks, explore the challenges of multilingual natural language processing of spoken dialogue, and set directions for future research.
    Learning General Inventory Management Policy for Large Supply Chain Network. (arXiv:2204.13378v1 [cs.AI])
    Inventory management in warehouses directly affects profits made by manufacturers. Particularly, large manufacturers produce a very large variety of products that are handled by a significantly large number of retailers. In such a case, the computational complexity of classical inventory management algorithms is inordinately large. In recent years, learning-based approaches have become popular for addressing such problems. However, previous studies have not been managed systems where both the number of products and retailers are large. This study proposes a reinforcement learning-based warehouse inventory management algorithm that can be used for supply chain systems where both the number of products and retailers are large. To solve the computational problem of handling large systems, we provide a means of approximate simulation of the system in the training phase. Our experiments on both real and artificial data demonstrate that our algorithm with approximated simulation can successfully handle large supply chain networks.
    Mixup-based Deep Metric Learning Approaches for Incomplete Supervision. (arXiv:2204.13572v1 [cs.LG])
    Deep learning architectures have achieved promising results in different areas (e.g., medicine, agriculture, and security). However, using those powerful techniques in many real applications becomes challenging due to the large labeled collections required during training. Several works have pursued solutions to overcome it by proposing strategies that can learn more for less, e.g., weakly and semi-supervised learning approaches. As these approaches do not usually address memorization and sensitivity to adversarial examples, this paper presents three deep metric learning approaches combined with Mixup for incomplete-supervision scenarios. We show that some state-of-the-art approaches in metric learning might not work well in such scenarios. Moreover, the proposed approaches outperform most of them in different datasets.
    Adversarial Fine-tune with Dynamically Regulated Adversary. (arXiv:2204.13232v1 [cs.LG])
    Adversarial training is an effective method to boost model robustness to malicious, adversarial attacks. However, such improvement in model robustness often leads to a significant sacrifice of standard performance on clean images. In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks. This leads to the question: To what extent we can boost model robustness without sacrificing standard performance? This work tackles this problem and proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance. In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity. Extensive experimentation indicates that the proposed method outperforms previous adversarial training algorithms towards the target: to improve model robustness while preserving model's standard performance on clean data.
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.
    WeaNF: Weak Supervision with Normalizing Flows. (arXiv:2204.13409v1 [cs.CL])
    A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on discriminative models, trained with cost functions specific to weak supervision, and more recently, generative models, trying to model the output of the automatic annotation process. In this work, we explore a novel direction of generative modeling for weak supervision: Instead of modeling the output of the annotation process (the labeling function matches), we generatively model the input-side data distributions (the feature space) covered by labeling functions. Specifically, we estimate a density for each weak labeling source, or labeling function, by using normalizing flows. An integral part of our method is the flow-based modeling of multiple simultaneously matching labeling functions, and therefore phenomena such as labeling function overlap and correlations are captured. We analyze the effectiveness and modeling capabilities on various commonly used weak supervision data sets, and show that weakly supervised normalizing flows compare favorably to standard weak supervision baselines.
    Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. (arXiv:2108.06325v2 [cs.LG] UPDATED)
    The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former. We show that in continual learning setups, Backprop performs well initially, but over time its performance degrades. Stochastic gradient descent alone is insufficient to learn continually; the initial randomness enables only initial learning but not continual learning. To the best of our knowledge, ours is the first result showing this degradation in Backprop's ability to learn. To address this degradation in Backprop's plasticity, we propose an algorithm that continually injects random features alongside gradient descent using a new generate-and-test process. We call this the \textit{Continual Backprop} algorithm. We show that, unlike Backprop, Continual Backprop is able to continually adapt in both supervised and reinforcement learning (RL) problems. Continual Backprop has the same computational complexity as Backprop and can be seen as a natural extension of Backprop for continual learning.
    Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training. (arXiv:2204.13666v1 [cs.LG])
    We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance. We introduce methods to dynamically adjust the size and format of the floating-point containers used to store activations and weights during training. The different value distributions lead us to different approaches for exponents and mantissas. Gecko exploits the favourable exponent distribution with a loss-less delta encoding approach to reduce the total exponent footprint by up to $58\%$ in comparison to a 32 bit floating point baseline. To content with the noisy mantissa distributions, we present two lossy methods to eliminate as many as possible least significant bits while not affecting accuracy. Quantum Mantissa, is a machine learning-first mantissa compression method that taps on training's gradient descent algorithm to also learn minimal mantissa bitlengths on a per-layer granularity, and obtain up to $92\%$ reduction in total mantissa footprint. Alternatively, BitChop observes changes in the loss function during training to adjust mantissa bit-length network-wide yielding a reduction of $81\%$ in footprint. Schr\"{o}dinger's FP implements hardware encoders/decoders that guided by Gecko/Quantum Mantissa or Gecko/BitChop transparently encode/decode values when transferring to/from off-chip memory boosting energy efficiency and reducing execution time.
    Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis. (arXiv:2204.13188v1 [cs.LG])
    Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies. In this paper, we propose an interpretable Graph Convolutional Network (GCN) framework for the identification and classification of Alzheimer's disease (AD) using multi-modality brain imaging data. Specifically, we extended the Gradient Class Activation Mapping (Grad-CAM) technique to quantify the most discriminative features identified by GCN from brain connectivity patterns. We then utilized them to find signature regions of interest (ROIs) by detecting the difference of features between regions in healthy control (HC), mild cognitive impairment (MCI), and AD groups. We conducted the experiments on the ADNI database with imaging data from three modalities, including VBM-MRI, FDG-PET, and AV45-PET, and showed that the ROI features learned by our method were effective for enhancing the performances of both clinical score prediction and disease status identification. It also successfully identified biomarkers associated with AD and MCI.  ( 2 min )
    BAGNet: Bidirectional Aware Guidance Network for Malignant Breast lesions Segmentation. (arXiv:2204.13342v1 [eess.IV])
    Breast lesions segmentation is an important step of computer-aided diagnosis system, and it has attracted much attention. However, accurate segmentation of malignant breast lesions is a challenging task due to the effects of heterogeneous structure and similar intensity distributions. In this paper, a novel bidirectional aware guidance network (BAGNet) is proposed to segment the malignant lesion from breast ultrasound images. Specifically, the bidirectional aware guidance network is used to capture the context between global (low-level) and local (high-level) features from the input coarse saliency map. The introduction of the global feature map can reduce the interference of surrounding tissue (background) on the lesion regions. To evaluate the segmentation performance of the network, we compared with several state-of-the-art medical image segmentation methods on the public breast ultrasound dataset using six commonly used evaluation metrics. Extensive experimental results indicate that our method achieves the most competitive segmentation results on malignant breast ultrasound images.  ( 2 min )
    Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers. (arXiv:2204.13326v1 [cs.LG])
    Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.  ( 2 min )
    Offline Visual Representation Learning for Embodied Navigation. (arXiv:2204.13226v1 [cs.CV])
    How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules. We call this method Offline Visual Representation Learning (OVRL). We conduct large-scale experiments - on 3 different 3D datasets (Gibson, HM3D, MP3D), 2 tasks (ImageNav, ObjectNav), and 2 policy learning algorithms (RL, IL) - and find that the OVRL representations lead to significant across-the-board improvements in state of art, on ImageNav from 29.2% to 54.2% (+25% absolute, 86% relative) and on ObjectNav from 18.1% to 23.2% (+5.1% absolute, 28% relative). Importantly, both results were achieved by the same visual encoder generalizing to datasets that were not seen during pretraining. While the benefits of pretraining sometimes diminish (or entirely disappear) with long finetuning schedules, we find that OVRL's performance gains continue to increase (not decrease) as the agent is trained for 2 billion frames of experience.  ( 2 min )
    Anomaly Detection by Leveraging Incomplete Anomalous Knowledge with Anomaly-Aware Bidirectional GANs. (arXiv:2204.13335v1 [cs.LG])
    The goal of anomaly detection is to identify anomalous samples from normal ones. In this paper, a small number of anomalies are assumed to be available at the training stage, but they are assumed to be collected only from several anomaly types, leaving the majority of anomaly types not represented in the collected anomaly dataset at all. To effectively leverage this kind of incomplete anomalous knowledge represented by the collected anomalies, we propose to learn a probability distribution that can not only model the normal samples, but also guarantee to assign low density values for the collected anomalies. To this end, an anomaly-aware generative adversarial network (GAN) is developed, which, in addition to modeling the normal samples as most GANs do, is able to explicitly avoid assigning probabilities for collected anomalous samples. Moreover, to facilitate the computation of anomaly detection criteria like reconstruction error, the proposed anomaly-aware GAN is designed to be bidirectional, attaching an encoder for the generator. Extensive experimental results demonstrate that our proposed method is able to effectively make use of the incomplete anomalous information, leading to significant performance gains compared to existing methods.  ( 2 min )
    TransHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction. (arXiv:2204.13221v1 [cs.AI])
    Knowledge graph embedding methods are important for knowledge graph completion (link prediction) due to their robust performance and efficiency on large-magnitude datasets. One state-of-the-art method, PairRE, leverages two separate vectors for relations to model complex relations (i.e., 1-to-N, N-to-1, and N-to-N) in knowledge graphs. However, such a method strictly restricts entities on the hyper-ellipsoid surface and thus limits the optimization of entity distribution, which largely hinders the performance of knowledge graph completion. To address this problem, we propose a novel score function TransHER, which leverages relation-specific translations between head and tail entities restricted on separate hyper-ellipsoids. Specifically, given a triplet, our model first maps entities onto two separate hyper-ellipsoids and then conducts a relation-specific translation on one of them. The relation-specific translation provides TransHER with more direct guidance in optimization and the ability to learn semantic characteristics of entities with complex relations. Experimental results show that TransHER can achieve state-of-the-art performance and generalize to datasets in different domains and scales. All our code will be publicly available.  ( 2 min )
    Open challenges for Machine Learning based Early Decision-Making research. (arXiv:2204.13111v1 [cs.LG])
    More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classification. This paper introduces a more general problem, called Machine Learning based Early Decision Making (ML-EDM), which consists in optimizing the decision times of models in a wide range of settings where data is collected over time. After defining the ML-EDM problem, ten challenges are identified and proposed to the scientific community to further research in this area. These challenges open important application perspectives, discussed in this paper.  ( 2 min )
    Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation. (arXiv:2204.13263v1 [cs.LG])
    The accuracy of deep neural networks is degraded when the distribution of features in the test environment (target domain) differs from that of the training (source) environment. To mitigate the degradation, test-time adaptation (TTA), where a model adapts to the target domain without access to the source dataset, can be used in the test environment. However, the existing TTA methods lack feature distribution alignment between the source and target domains, which unsupervised domain adaptation mainly addresses, because accessing the source dataset is prohibited in the TTA setting. In this paper, we propose a novel TTA method, named Covariance-Aware Feature alignment (CAFe), which explicitly aligns the source and target feature distributions at test time. To perform alignment without accessing the source data, CAFe uses auxiliary feature statistics (mean and covariance) pre-computed on the source domain, which are lightweight and easily prepared. Further, to improve efficiency and stability, we propose feature grouping, which splits the feature dimensions into groups according to their correlations by using spectral clustering to avoid degeneration of the covariance matrix. We empirically show that CAFe outperforms prior TTA methods on a variety of distribution shifts.  ( 2 min )
    Counterfactual Explanations for Natural Language Interfaces. (arXiv:2204.13192v1 [cs.CL])
    A key challenge facing natural language interfaces is enabling users to understand the capabilities of the underlying system. We propose a novel approach for generating explanations of a natural language interface based on semantic parsing. We focus on counterfactual explanations, which are post-hoc explanations that describe to the user how they could have minimally modified their utterance to achieve their desired goal. In particular, the user provides an utterance along with a demonstration of their desired goal; then, our algorithm synthesizes a paraphrase of their utterance that is guaranteed to achieve their goal. In two user studies, we demonstrate that our approach substantially improves user performance, and that it generates explanations that more closely match the user's intent compared to two ablations.  ( 2 min )
    On the Convergence of Momentum-Based Algorithms for Federated Stochastic Bilevel Optimization Problems. (arXiv:2204.13299v1 [cs.LG])
    In this paper, we studied the federated stochastic bilevel optimization problem. In particular, we developed two momentum-based algorithms for optimizing this kind of problem. In addition, we established the convergence rate of these two algorithms, providing their sample and communication complexities. To the best of our knowledge, this is the first work achieving such favorable theoretical results.  ( 2 min )
    Watts: Infrastructure for Open-Ended Learning. (arXiv:2204.13250v1 [cs.AI])
    This paper proposes a framework called Watts for implementing, comparing, and recombining open-ended learning (OEL) algorithms. Motivated by modularity and algorithmic flexibility, Watts atomizes the components of OEL systems to promote the study of and direct comparisons between approaches. Examining implementations of three OEL algorithms, the paper introduces the modules of the framework. The hope is for Watts to enable benchmarking and to explore new types of OEL algorithms. The repo is available at \url{https://github.com/aadharna/watts}  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
  • Open

    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.  ( 2 min )
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.  ( 2 min )
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.  ( 2 min )
    Normalizing flows for atomic solids. (arXiv:2111.08696v2 [physics.comp-ph] UPDATED)
    We present a machine-learning approach, based on normalizing flows, for modelling atomic solids. Our model transforms an analytically tractable base distribution into the target solid without requiring ground-truth samples for training. We report Helmholtz free energy estimates for cubic and hexagonal ice modelled as monatomic water as well as for a truncated and shifted Lennard-Jones system, and find them to be in excellent agreement with literature values and with estimates from established baseline methods. We further investigate structural properties and show that the model samples are nearly indistinguishable from the ones obtained with molecular dynamics. Our results thus demonstrate that normalizing flows can provide high-quality samples and free energy estimates without the need for multi-staging.  ( 2 min )
    Forecasting Brain Activity Based on Models of Spatio-Temporal Brain Dynamics: A Comparison of Graph Neural Network Architectures. (arXiv:2112.04266v2 [q-bio.NC] UPDATED)
    Comprehending the interplay between spatial and temporal characteristics of neural dynamics can contribute to our understanding of information processing in the human brain. Graph neural networks (GNNs) provide a new possibility to interpret graph structured signals like those observed in complex brain networks. In our study we compare different spatio-temporal GNN architectures and study their ability to model neural activity distributions obtained in functional MRI (fMRI) studies. We evaluate the performance of the GNN models on a variety of scenarios in MRI studies and also compare it to a VAR model, which is currently often used for directed functional connectivity analysis. We show that by learning localized functional interactions on the anatomical substrate, GNN based approaches are able to robustly scale to large network studies, even when available data are scarce. By including anatomical connectivity as the physical substrate for information propagation, such GNNs also provide a multi-modal perspective on directed connectivity analysis, offering a novel possibility to investigate the spatio-temporal dynamics in brain networks.  ( 2 min )
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.  ( 2 min )
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.  ( 2 min )
    Adversarial Meta-Learning of Gamma-Minimax Estimators That Leverage Prior Knowledge. (arXiv:2012.05465v2 [stat.ME] UPDATED)
    Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $\Gamma$ of prior distributions that are compatible with the available knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In this work, we define Gamma-minimax estimators for general models and propose adversarial meta-learning algorithms to compute them when the set of prior distributions is constrained by generalized moments. Accompanying convergence guarantees are also provided. We also introduce a neural network class that provides a rich, but finite-dimensional, class of estimators from which a Gamma-minimax estimator can be selected. We illustrate our method in two settings, namely entropy estimation and a prediction problem that arises in biodiversity studies.  ( 2 min )
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.  ( 2 min )
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.  ( 2 min )
    Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching. (arXiv:2204.13453v1 [cs.CV])
    State-of-the-art fully intrinsic networks for non-rigid shape matching often struggle to disambiguate the symmetries of the shapes leading to unstable correspondence predictions. Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. Our architecture is built on top of DiffusionNet, making it robust to discretization changes. Additionally, we introduce a vector field-based loss, which promotes orientation preservation without using (often unstable) extrinsic descriptors.  ( 2 min )
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.  ( 2 min )
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.  ( 2 min )
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].  ( 3 min )
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.  ( 2 min )
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).  ( 2 min )
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.  ( 2 min )
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.  ( 2 min )
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.  ( 2 min )
    On the Use of Dimension Reduction or Signal Separation Methods for Nitrogen River Pollution Source Identification. (arXiv:2204.13182v1 [stat.AP])
    Identification of the current and expected future pollution sources to rivers is crucial for sound environmental management. For this purpose numerous approaches were proposed that can be clustered under physical based models, stable isotope analysis and mixing methods, mass balance methods, time series analysis, land cover analysis, and spatial statistics. Another extremely common method is Principal Component Analysis, as well as its modifications, such as Absolute Principal Component Score. they have been applied to the source identification problems for nitrogen entry to rivers. This manuscript is checking whether PCA can really be a powerful method to uncover nitrogen pollution sources considering its theoretical background and assumptions. Moreover, slightly similar techniques, Independent Component Analysis and Factor Analysis will also be considered.  ( 2 min )
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.  ( 2 min )
    Asymptotic Inference for Infinitely Imbalanced Logistic Regression. (arXiv:2204.13231v1 [math.ST])
    In this paper we extend the work of Owen (2007) by deriving a second order expansion for the slope parameter in logistic regression, when the size of the majority class is unbounded and the minority class is finite. More precisely, we demonstrate that the second order term converges to a normal distribution and explicitly compute its variance, which surprisingly once again depends only on the mean of the minority class points and not their arrangement under mild regularity assumptions. In the case that the majority class is normally distributed, we illustrate that the variance of the the limiting slope depends exponentially on the z-score of the average of the minority class's points with respect to the majority class's distribution. We confirm our results by Monte Carlo simulations.  ( 2 min )

  • Open

    Should I become an Art therapist
    submitted by /u/BalanceSubstantial66 [link] [comments]
    Is it easier to mimic a model based on its input/output or to train an original model in the first place ?
    An original model is trained with a data-set, typically labeled by humans (say for classifiers). However, what if one would like to copycat a closed model only exposed through an API ? By doing this, the data-set instead would be the input / output of the original model. ​ Is it easier to train the copycat model or the original one ? How much data would be required to train the copycat versus the original one ? Are there practical examples or this happening ? Can it possibly worth it and if so under which circumstances ? How much data would be required for example for the notorious Dall-e 2? submitted by /u/Wishmaster04 [link] [comments]  ( 1 min )
    Disney princesses according to AI. Is this done manually or through an AI app?
    submitted by /u/p0goniphaft111 [link] [comments]
    Specification gaming: the flip side of AI ingenuity
    submitted by /u/estasfuera [link] [comments]
    Beyond interpretability: developing a language to shape our relationships with AI
    submitted by /u/estasfuera [link] [comments]
    Guide to Iteratively Tuning GNN's
    submitted by /u/aidev2040 [link] [comments]
    Last Week in AI: AI Driving Instructors, MASSIVE Speech Dataset, AI ’Show Stealers’, Tesla Jet Crash
    submitted by /u/regalalgorithm [link] [comments]
    Best way to format dialogue to fine tune GPT-J, 3 ...
    Hi all, a quick questions, given that online I'm not finding that many info: how would you format a dialogue between people to fine tune a GPT model (be it GPT-J, GPT-3 etc.)? For example, if I want the GPT model to create a new dialogue from the show "Friends" (choosed this because all dialogues are available online https://www.kaggle.com/datasets/blessondensil294/friends-tv-series-screenplay-script?resource=download ) how should I format the input? ​ [Scene: Central Perk, Chandler, Joey, Phoebe, and Monica are there.] Monica: There's nothing to tell! He's just some guy I work with! Joey: C'mon, you're going out with the guy! There's gotta be something wrong with him! Chandler: All right Joey, be nice. So does he have a hump? A hump and a hairpiece? Phoebe: Wait, does he eat chalk? (They all stare, bemused.) Phoebe: Just, 'cause, I don't want her to go through what I went through with Carl- oh! Monica: Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex. Chandler: Sounds like a date to me. submitted by /u/Sgnarf1989 [link] [comments]  ( 1 min )
    Meta's AI team searches for the secret trick of human intelligence
    submitted by /u/much_successes [link] [comments]
    I am new to Artificial Intelligence and I need your worthy suggestions about AI.
    Hey guys, I am new to AI. Please suggest some good and new topics/technologies that are supported by AI which will be very informative and beneficial for me in my career. Every suggestion is a high priority for me. Waiting for your replies. Thanks submitted by /u/adilonreddit1 [link] [comments]  ( 1 min )
    AI Augmentation in this Assemblage 23 music video
    submitted by /u/LightOfAntara [link] [comments]
    How Human Pose Estimation Technology Can Be Used in 2022?
    Hi everyone, I want to share with you an article that I worked on with my colleague about the use cases of human pose estimation. I would love it if you could check it out and share your ideas for using this technology in the comments below. https://mobidev.biz/blog/human-pose-estimation-technology-guide submitted by /u/Data-Power [link] [comments]
    Deepmind Researchers Propose Fair Normalizing Flows (FNF): A Rigorous Approach For Learning Fair Representations
    ​ https://preview.redd.it/2hryjv40cgw81.png?width=1024&format=png&auto=webp&s=30f13567b83b3cdfe5558d09f188b2e69da7e73f Fair representation learning has emerged as one of the most promising techniques to encode data into new, impartial representations with high utility as machine learning is increasingly utilized in settings that potentially harm humans. Fair representation means presenting data without regard to gender, color, or other factors. Due to human-introduced bias, these biases are found in the word vector representations in language models. The goal of Learning Fair Representation is to reduce bias by decreasing the semantic distance between biassed terms. The goal of fair representation learning is to guarantee that representations are useful for a variety of prediction tasks and that sensitive aspects of the original data cannot be extracted from them. Adversarial training, which combines an encoder aiming to turn data into a fair representation with an adversary attempting to recover sensitive features from the representation, is the most used method for learning fair representations. However, recent research has discovered that these methods do not provide completely fair representations: stronger opponents can recover sensitive features. This could allow malicious or uneducated users to discriminate using the available representations. The issue of fair representation has lately risen in prominence as regulators draught guidelines on the ethical use of AI, indicating that any company that cannot ensure non-discrimination would be held liable for the data produced. Continue Reading Paper: https://openreview.net/pdf?id=BrFIKuxrZE Github: https://github.com/eth-sri/fnf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    It’s not a place (made with starryai)
    submitted by /u/Losthel [link] [comments]
    Night Cafe "fire on the nuclear plant in the amazon by Simon Stalenhag"
    submitted by /u/brunovianna [link] [comments]
    Ladybugs (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Adaptive Multi-Strategy Market-Making Agent For Volatile Markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    Artificial Nightmares: Hall Monitor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Playing Chess With Offline Reinforcement Learning
    submitted by /u/Bellerb [link] [comments]  ( 1 min )
    How to use RL for combinatorial optimization
    I am trying to use reinforcement for combinatorial optimization. I coded up a toy example for the travelling salesman but I am confused by something. In stable baselines 3 there is a variable "done". If I set it to optimum value (that is the length of the shortest route through all the cities computer by a different method) the RL algorithm never finds it. What should done be set to? Or more generally how do you do optimization of this sort using RL. I can just set it so that it is done after 1000 steps but then it is spending a lot of time finding the best way to take the first 1000 steps which isn't quite the point. submitted by /u/wiggyhat [link] [comments]  ( 2 min )
    Can agents act simultaneously with no notion of turn-taking?
    I was reading this paper and in section 3 they claim that agents act simultaneously and there is no notion of turn-taking: https://arxiv.org/abs/2104.07750 I was wondering how this works. What I'm used to seeing is a for loop in which, one after the other, agents execute the step function and interact with the environment. How does this change if all agents act simultaneously? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    How to mitigate catastrophic forgetting for an intelligent agent using replay memory and Deep Reinforcement Learning algorithm
    Hello everyone. I built a use case to instruct an agent, which simulates a herd of predators, to encircle a prey. Episodes are non-terminal and have a fixed number of steps. The agent uses a neural network to decide which actions to take at each step and is instructed by a remember/replay mechanism using a memory of past events. I started the experiment using absolute coordinates in the state returned by the environment and the result is displayed in the following image, where the total reward values at the end of each episode ​​are displayed. https://preview.redd.it/oscet38c0gw81.png?width=640&format=png&auto=webp&s=092fc92ee56c919bdc1851bb9f1ef9bdc585bfde In this case the results were encouraging. However, I wanted to create a more generic use case, using vectors that represent the position of the prey with respect to predators instead of absolute coordinates, in order to make the task more generic. Unfortunately, the best result I was able to obtain is the one shown in the image below where, apparently, I have a deterioration after a certain number of episodes. ​ https://preview.redd.it/yoiygwxe0gw81.png?width=640&format=png&auto=webp&s=06b7082cc0ebb80e3f5e5ce8307d75cd6451f79b It seems to me a case of catastrophic forgetting (I could be wrong) and I managed to mitigate it by increasing the memory, retaining the results of the older episodes, lowering the learning rate of the neural network and using a learning decay algorithm, but I have not succeeded to eliminate the phenomenon completely. Anyone have any advice? submitted by /u/kaeldric__ [link] [comments]  ( 2 min )
    Speeding up custom implementations.
    Hi all! I've been implementing some of the model-free algorithms in recent times. Comparing their performance to open-source libraries, however, learning takes severely longer in a computational sense. I wonder what tricks libraries such as stable-baselines3 use to increase frames per seconds and accelerate policy updates. So far, I vectorized the environment sampling & the bottleneck seems to be computing the updates for the agent. Thanks a lot! submitted by /u/Internal-Brush4929 [link] [comments]  ( 1 min )
    Any blog or video series that is available to run TD3 algorithm for path planning purpose . Trained agents deployed in hardware level
    submitted by /u/ajithvallabai [link] [comments]  ( 1 min )
    Microsoft AI Researchers Introduce PPE: A Mathematically Guaranteed Reinforcement learning (RL) Algorithm For Exogenous Noise
    Reinforcement learning (RL) is a machine learning training strategy that rewards desirable behaviors while penalizing undesirable ones. A reinforcement learning agent can perceive and comprehend its surroundings, act, and learn through trial and error in general. Although RL agents can heuristically solve some problems, such as assisting a robot in navigating to a specific location in a given environment, there is no guarantee that they will be able to handle problems in settings they have not yet encountered. The capacity of these models to recognize the robot and any obstacles in its path, but not changes in its surrounding environment that occur independently of the agent, which we refer to as exogenous noise, is critical to their success. Existing RL algorithms are not powerful enoug…  ( 2 min )
  • Open

    [D] Usage Optimized AWS GPUs, 57-63% off On-Demand Prices
    www.usage.ai ​ Usage AI bundles 3-year no-upfront RIs on AWS with guaranteed buyback -- so users get all the savings of 3-year RIs with none of the commitment. I helped engineer the product. Here to answer any questions! submitted by /u/usage-team [link] [comments]
    [P] Blog post: Learning JAX by Learning to Learn
    I recently published a new blog post, which goes over how meta-learned optimizers work and how to implement them in JAX. JAX's composable function transforms make implementing meta-learning algorithms very straightforward. If you're interested in JAX or meta learning give it a read! Blog post: https://teddykoker.com/2022/04/learning-to-learn-jax/ Code: https://github.com/teddykoker/learning-to-learn-jax Original Paper (NeurIPS '16): https://arxiv.org/abs/1606.04474 submitted by /u/tomkoker [link] [comments]  ( 1 min )
    [P] Introducing FlowMeter for network packet analysis
    We’ve released a new open source project - https://github.com/deepfence/FlowMeter - to analyze and classify packet captures using ML techniques. FlowMeter is an experimental project; we’re using it to evaluate how effectively we can train an ML model to discriminate between different types of traffic flows, e.g. normal and anomalous. You can use sample data from various sources (see the README), or gather packet captures using PacketStreamer https://github.com/deepfence/PacketStreamer or other pcap tools. More information in the README, here: https://github.com/deepfence/FlowMeter and this blogpost: https://medium.com/@siddharthsatpathy.ss/introducing-flowmeter-97e0507862b6 Hope some people find it useful; we’d welcome any feedback, thank you. submitted by /u/sidd_ss [link] [comments]  ( 1 min )
    [P] open-source python library for making machine learning demos that runs in the browser or inside a jupyter notebook/google colab, package is available on PyPI
    https://gradio.app/ is for demoing machine learning models https://reddit.com/link/ueod3x/video/75e1ozmjihw81/player Prerequisite: Python 3.7+ and that's it! Quick Start To get Gradio running with a simple "Hello, World" example, follow these three steps: Install Gradio from pip. ​ pip install gradio Run the code below as a Python script or in a Python notebook (or in a colab notebook). ​ import gradio as gr def greet(name): return "Hello " + name + "!!" demo = gr.Interface(fn=greet, inputs="text", outputs="text") if __name__ == "__main__": demo.launch() The interface below will appear automatically within the Python notebook, or pop in a browser on http://localhost:7860 if running from a script. see more in the getting started guide: https://gradio.app/getting_started/ submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [P] Hot Reloading for Pandas
    Hi guys I thought you might find my project useful. It's called Reloadium and saves a lot of time during python development, especially in data science and machine learning field. More details here: https://github.com/reloadware/reloadium Hot Reloading dataframes Modifying and fixing code during debugging What do you guys think about it? submitted by /u/kwazar90 [link] [comments]  ( 1 min )
    [D] SHAP method to get feature importance: is linearity realistic?
    The SHAP method assumes linear relationship between the feature effects (see definition "Additive feature attribution methods" in the paper). But is this assumption realistic? submitted by /u/savoga [link] [comments]  ( 1 min )
    [D] Need to find a good self-hosted medical image annotation tool.
    Hi. I'm trying to come up with a solution for a medical image annotation system for our laboratory. We need it to be self-hosted (in order to only work in the uni's internal network), and open source (since the funds are limited). So far I've only found out about https://lab.vindr.ai/dashboard/projects but the documentation is really bad and I could not launch it using docker compose. I've also found MONAILabel(https://github.com/Project-MONAI/MONAILabel), but it apparently requires GPU which makes it really expensive. I'd rather find a cpu based solution because our task is not that complex. We only get some Dicom files (each have studies in them), and want to label them. submitted by /u/feryet [link] [comments]  ( 1 min )
    Machine Learning in Fantasy Premier League? [D]
    https://medium.com/@subin.sen7/our-client-is-the-worlds-best-fpl-player-this-is-not-clickbait-51481fae76c9 Is it possible to use models to beat the market in Fantasy Football? submitted by /u/AI_FPL [link] [comments]
    [P] XGboost, sklearn and others running over encrypted data
    Hello everyone! Following this post numpy in fhe we are releasing a new lib that allows popular machine learning frameworks to run over encrypted data: https://github.com/zama-ai/concrete-ml Currently this supports xgboost and many sklearn models. We also support pytorch to some extent. We are trying to closely follow sklearn API (when relevant) to make the use easy to machine learning practitioners. Happy to hear any feedback on this ! submitted by /u/strojax [link] [comments]  ( 1 min )
    [D] Is Tensorflow.js still much slower than Tensorflow
    Last time I used Tensorflow was 3 years ago and it was a resource hog as well as very slow. I intend to get back into machine learning and wondering if I should give Tensorflow.js a go again using the Nodejs backend since this will be an electron app. Or should I just go straight for Tensorflow. Ideally I will need about 15 fps while taking up minimal resources submitted by /u/manrayboy [link] [comments]  ( 1 min )
    [D] Collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code
    Hey r/MachineLearning, I'm Derrick from Layer (layer.ai) - the collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code. We are soft-launching today! I’ve been working on Layer for the past 2 years with an awesome team around the world. We really poured our hearts and minds into Layer and hope you will like it. Your feedback would be very appreciated! Layer Demo To get started, you can simply run our Quickstart Example! How is Layer different from other tools? Although there are plenty of ML and DS tooling products, we believe that there is still a large gap around collaboration. Many data science projects are hosted on GitHub, which, in our experience, does not provide sufficient depth and abs…  ( 4 min )
  • Open

    How to Tune Graph Neural Networks
    submitted by /u/aidev2040 [link] [comments]
  • Open

    A one-up on motion capture
    A new neural network approach captures the characteristics of a physical system’s dynamic motion from video, regardless of rendering configuration or image differences.  ( 7 min )
    Engineers use artificial intelligence to capture the complexity of breaking waves
    Their model’s predictions should help researchers improve ocean climate simulations and hone the design of offshore structures.  ( 6 min )
  • Open

    Extracting Skill-Centric State Abstractions from Value Functions
    Posted by Dhruv Shah, Intern, and Brian Ichter, Research Scientist, Robotics at Google Advances in reinforcement learning (RL) for robotics have enabled robotic agents to perform increasingly complex tasks in challenging environments. Recent results show that robots can learn to fold clothes, dexterously manipulate a rubik’s cube, sort objects by color, navigate complex environments and walk on difficult, uneven terrain. But "short-horizon" tasks such as these, which require very little long-term planning and provide immediate failure feedback, are relatively easy to train compared to many tasks that may confront a robot in a real-world setting. Unfortunately, scaling such short-horizon skills to the abstract, long horizons of real-world tasks is difficult. For example, how would one trai…  ( 7 min )
  • Open

    Techniques to Write Better Python Code
    We write a program to solve a problem or make a tool that we can repeatedly solve a similar problem. For the latter, it is inevitable that we come back to revisit the program we wrote, or someone else is reusing the program we write. There is also a chance that we will encounter data […] The post Techniques to Write Better Python Code appeared first on Machine Learning Mastery.  ( 15 min )
  • Open

    How to get AI to confuse a shark with a clam
    "The Megalodon was a large bivalve, measuring up to 2.5 meters in length. Its shell was covered in spines, and it had a large, powerful jaw for crushing prey." Although the megalodon is the most widely known as a giant prehistoric shark, I recently learned that Megalodon  ( 4 min )
    Bonus: GPT-3 answers my questions about sawflies, badly
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    ‘I Doubt, Therefore I Am,’ Said AI
    It’s Monday morning, and Paul opens one of several emails sent by his boss, Heather. Her email seems a bit unusual, asking Paul to rush to… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    Designing Societally Beneficial Reinforcement Learning Systems
    Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution - for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine …  ( 8 min )

  • Open

    Decision Transformers with Hugging Face
    submitted by /u/hellopaperspace [link] [comments]
    How do you get a global observation?
    Naive question: how do you pass a global observation of the environment to your actor and critic? In other words, is a global observation always available or does it depend on the environment? Can you give me a couple of examples? Thanks :) submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Why do Monte Carlo methods have low bias compared to TD methods in RL. Does the bias term in RL has same meaning as in general in ML terminology?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    2nd Neural MMO challenge is out! Design your policy to master PvE and PvP challenge.
    submitted by /u/xiaolongzhu [link] [comments]  ( 1 min )
    Parallel environments affect Q learning performance
    Parallel environments can make the learning much more efficient, but sometimes using parallel environments seemed affect the performance. Does anyone know why and how to solve? submitted by /u/Traditional-Ad4492 [link] [comments]  ( 1 min )
    Papers for Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    What is the current SOTA for single-threaded continuous-action control using RL?
    As above. I am interested in RL for robotics, specifically for legged locomotion. I wish to explore RL training on the real robot. Sample efficiency is paramount. Has any progress been made by utilizing, say, RNNs/LSTMs or even Attention ? submitted by /u/pakodanomics [link] [comments]  ( 1 min )
    Is it possible/useful to allow the agent to influence the observations in later timesteps?
    In my current environment, the agent is fed observations from a particular mathematical computation based upon the underlying state. This mathematical computation has hyper-parameters that influence the resulting observation given to the agent. Does it make any sense whatsoever to give these hyper-parameters to the agent within the action space? In this way the agent would be able to adjust its future observations and perhaps it would be able to use this in a smart way. On the other hand, I've heard that changing the environment from under the agents feed during training can lead to major issues with training. One can imagine that learning a dynamic environment is harder than a static one. Any thoughts? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing RL PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    Could you recommend a paper on self-supervised RL related to catastrophic forgetting?
    Hi, I have problems the agent forgets good states during self-supervised pre-training. It shows good exploration, but as time goes by, only the edge case is explored, showing poor performance at finetune. I found a paper about that before, but It is really hard to find again. It related to reward. Could you recommend a paper related to this? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [D] Any independent researchers ever get published/into conferences?
    For personal context: I worked over time during my bachelors and took grad classes/started research, but financial situation took a hit for the worse and we'll let's say I don't have enough money for a Masters/continue my studies. I'm job hunting now but in a chicken/egg problem most of these jobs want at the very least research (which I know how to conduct) for which they have the resources for. Either way I have a couple research topics I want to explore, but limited resources and want to be realistic here. main question: do you know of anyone who has done it as a non-grad/non-institutional way without the help of established figures in the field. I only know of those that have done it in the way I describe above (either under an established researcher's team, a university, or a company)? Add-on: would appreciate any personal experiences as well, and if anyone has experience with conferences/how long it takes, etc. My research experience has been largely under NDAs so I haven't experienced formal publishing yet (but want to!) submitted by /u/robml [link] [comments]  ( 2 min )
    [D] feature embeddings extraction for image classification
    Is there any rule of thumb to decide from which layer extract feature embeddings for classification tasks based on knn? Does it depend by the architecture ?(conv net , vs vit models, ecc) Should I extract before or after activation function? Edit: Which model do you suggest to start from ? Actually i'm using a resnet50 trained with DINO submitted by /u/Rich_Freedom98 [link] [comments]  ( 1 min )
    [P] Terraform Provider Iterative (TPI) - plugin for ML/AI workloads to spot instance recovery & auto-termination on AWS, GCP, Azure, Kubernetes
    Terraform Provider Iterative (TPI) address the specific needs of machine learning teams - it is an open-source tool extending the functionality of Terraform, the world's most widely used multi-cloud provisioning product. The tool enables full lifecycle management of computing resources and is designed specifically for machine learning pipelines: Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes The tool aims to bridge the gap between devops and data science teams and build on top of Terraform, a tool universally familiar to devops teams, but extend it to suit machine learning needs. It provides to following advantages for your ML workflow: Lower cost: use your preferred cloud provider's existing pricing, including on-demand per-second billing and bulk discounts. Auto-recovery: spot/preemptible instances are cheap but unreliable. TPI reliably and automatically respawns such interrupted instances, caching & restoring the working directory in the cloud even when you are offline. Custom spec: full control over hardware & software requirements via a single config file. submitted by /u/cmstrump [link] [comments]  ( 1 min )
    [R] Flamingo: a Visual Language Model for Few-Shot Learning (from DeepMind)
    Paper (pdf). A link to the paper is also in this blog post. Abstract: Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    [D] Self-Organizing Maps and Principal Component Analysis.
    Hello all, I am doing my PhD on numerical modelling of solar radiation and I am researching SOMs as a possible tool. I have this problem where I can't decide on a network size, so I have been trying to develop some means to compare SOM neurons from different sized networks. I have read that the 2-d SOM array tends to spam the 2 highest variance principal components. So would it be adecuate to tag my SOM nodes with the first two proyections of PCA? So that I can compare results from different experiments. submitted by /u/juliancanellas [link] [comments]  ( 1 min )
    "[Discussion]", Same MAE Result
    Hello guys, I'm wondering if you can help: we use machine learning models to identify trading opportunities using historical trading data of price and market indicators. The data we use is fine: abundant, accurate and (manually) proven to work over several non-sequential months in the past. The problem is that we keep on getting the exact same MAE result. Any suggestions on solving this issue would be immensely appreciated. submitted by /u/AFAC1410 [link] [comments]  ( 1 min )
    [P] New blog post: how to automatically find label errors in your audio datasets!
    Hi folks, our blog post on how to automatically find label errors in audio datasets has just gone live. We cover the steps to: ⛏️ Perform feature extraction (aka embeddings) on the Spoken Digit dataset with a pre-trained PyTorch model. 🔢 Use cross-validation to generate out-of-sample predicted probabilities for every example in the dataset. 🏷️ Run one line of cleanlab code on these predicted probabilities to identify which audio clips may be mislabeled. 📰 Blog Post + Google Colab: https://cleanlab.ai/blog/label-errors-audio-datasets/ https://preview.redd.it/vpzhwg6jebw81.png?width=1260&format=png&auto=webp&s=3fa79d8a097ae5936e5e6a51e87508adf6190835 submitted by /u/weijinglok [link] [comments]  ( 1 min )
    Shazam App for your brain? [R]
    https://arxiv.org/abs/2202.03265?context=eess Computer vision approach to processing naturalistic brain responses to music. Models achieved state of the art and they used publicly available datasets. With 1 sec of brain signal they can classify the name of the song you're listening to and how much you enjoyed it ~89%. Imagine this but on your airpods, seems wild. submitted by /u/blackliquerish [link] [comments]
    [D] What roles were you able to receive as a PhD with no top tier research?
    It’s looking more and more like I won’t be able to get any top tier publications in my PhD (NeurIPS, ICLR, etc.). I’ve tried, but various factors have limited the experience. I’m curious what people were able to do in this circumstance. I’ll have 3-5 publications in mid-tier venues, plus many years of experience as a software engineer and machine learning engineer (in non-tech companies). People in my program have gone on to work at Amazon, and a few Google, but these were all applied scientist and engineering roles. Does anyone go on to work as a research scientist? National labs seem like a good bet, I’m curious what others have done. submitted by /u/walterkronkite33 [link] [comments]  ( 5 min )
    [D] Have we stopped researching agents?
    It seems to me that the ML research community has stopped talking about agents? As in, machines that act in complex environments by themselves? Ever since GPT-3 came out, bascially every single paper has been related to NLP or Transformers in general. It seems a natural next step would be to try to get agents we could tell what to do and how to do it in natural language, no? Yet that has not been the focus at all. The last time we had wide spread focus on planning and acting was the Starcraft challenge, but that was "solved" so quickly nothing much came of it. Research I know about: - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - Tesla in general I guess, with their cars and maybe robots Is it just me or is there just no interest in researching more capable agents right now? submitted by /u/ReasonablyBadass [link] [comments]  ( 2 min )
    [D] Advertisement Allocation models being used?
    There are a bunch of papers on different advertisement allocation techniques (placing ads to valid ad spots) and in algorithms in grad school I learned MaxFlow/MinCut, but I'm wondering what is actually being used in production systems? Linear models, genetic algorithms, reinforcement learning, some neural network? I know that very little of this is published/talked about due to being proprietary. However, I'm wondering because so much gets published, but the more advanced algorithms tend to be complex and not necessarily possible to be deployed in production systems (what I suspect)? submitted by /u/Flipper3 [link] [comments]  ( 2 min )
    [D] ML model dev pipelines
    Hi Folks! It'd really help if you can participate in this poll and share about your ML/DL model workflow before moving it to production. Out of these steps, which one is your most preferred way of integrating ML into your app or business (not specific to any domain): View Poll submitted by /u/fgp121 [link] [comments]  ( 1 min )
    [R] RL papers relating to Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    [D] What is the recommended way to estimate norm for gradient clipping?
    I have read several blogs in which they specified that you should clip your gradients to the largest value that doesn't cause exploding gradients. But that also means that you could get a situation that you don't need to do gradient clipping at all. Could that hurt convergence speed in some way? Is there any guideline for how to pick this hyperparameter based on learning rate, batch size, model size, input size (seq len, img size) etc.? Or is this project/use case dependent and it represents "just another hyperparameter"? submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [P] HuSpaCy: Industrial-strength Hungarian NLP
    I'd like to show off a Hungarian NLP pipeline which we've been heavily improving over the past year. https://github.com/huspacy/huspacy While processing Hungarian texts might not be interesting for the most of you, I believe this project can be a good learning resource, as our models utilize the latest NLP technologies from Explosion and are fully reproducible: We've created a transformers-based model on top of a language specific BERT model Multi-task and transfer-learning is heavily used across the models. Incorporated edit-tree lemmatization and biaffine parsing from spacy-experimental We provide word embeddings using the (fastText-like) floret tool submitted by /u/oroszgy [link] [comments]  ( 1 min )
    [D] How do you decide the ranges for hyperparameters when doing a grid search or a random search?
    I'm trying to tune the parameters of two models - a random forest classifier, and a gradient boosting classifier. When using a grid search or a random search (I might also play around with the genetic algorithm), what are appropriate ranges to use / how can I come to know them? I obviously can't just specify a very arbitrary range because that might decrease the effectiveness of the random search, and make the grid search too long. Do they depend on the number of features in the dataset or other characteristics (e.g. our dataset is quite imbalanced and we're using SMOTE resampling)? I'm just trying to tune n_estimators and max_depth, but RandomForestClassifier also has many other parameters, do I just experiment with all of them, or are there any known parameters that don't do anything useful unless I'm trying for an extra 0.1% of accuracy or recall? Basically the same questions as above for the GradientBoostingClassifier. Letting go of specific models for a second, I'd appreciate generic pointers on how to deliberately tune hyperparameters or specify ranges instead of just arbitrarily specifying sth and hoping it works. Thanks! submitted by /u/stuffingmybrain [link] [comments]  ( 2 min )
    How to do meaningful work as an independent researcher? [Discussion]
    With big players like OpenAI and Google building these massive models, how does independent researchers without access to such scale and compute do meaningful work? Came across tweets from researchers, especially ones working on generative models saying they feel their work looks irrelevant after seeing results from DALL-E 2. It feels like just a couple of years ago if you had a decent GPU setup, you could pretty much do world class research. Doesn't look like it anymore. Is there, if any, research directions that makes it a level playing field where compute and scale is not necessarily the solution, or are we all doomed to be prompt engineers for GPT models? submitted by /u/HairyIndianDude [link] [comments]  ( 4 min )
  • Open

    How long before dall-e 2 (or similarly capable) produces porn ?
    Dall-e 2 has been released a few weaks and is exceptionnally capable for generating images from a text description. However training data containing porn has been filtered out as well as requests that contains nsfw terms. So is it not suited for generated porn for now. However, how long would it takes before : Someone does something as capable and allows porn OpenAI opens dall-e 2 to such content (edit)This is an open discussion.But there is clear evidence that the technology now exists to create on demand and instantly any kind of porn with any kind of kink in a lot of styles including photorealistic style. submitted by /u/exotic_deviantcy [link] [comments]
    Augmented Military Soldiers
    We often believe that future soldiers are going to be robots. There will no longer be human "boots on the ground" thanks to artificial intelligence, drones, etc. Many of us fail to realize that current soldiers, in militaries across the globe, are being augmented with AI as we speak. Things like AR headsets, computerized sights, and other technologies are going to turn them into cyborgs...Another one of the intriguing aspects of this is that major companies like Microsoft are creating these technologies for the military. How long until we see augmented soldiers going against one another? https://www.linkedin.com/pulse/augmented-soldier-mvyl-associates-1f/?trackingId=FnvaWMtxMsfm2fJeQP1bgg%3D%3D submitted by /u/IsabeldeMontoya [link] [comments]  ( 1 min )
    A Parable Of Explainability
    submitted by /u/elcric_krej [link] [comments]
    AI Dream 18 - Visual Trip through Wonderland
    submitted by /u/LordPewPew777 [link] [comments]
    Community College AI Program Needs
    I am the instructor and lead of the AI program at a community college in North Carolina. I am developing the program currently and have money to spend on Cool Educational Robotics and potentially other tools. Does anybody have any recommendations for robots to teach python with classic search, reinforcement learning methods, or other university level algorithms? A robotic arm would be nice, but I need to tie it directly to a potential project in machine learning, deep learning, reinforcement learning, search, logic, computer vision, etc. We have a NAO robot and are getting a LIMO: https://www.robotlab.com/store/limo-agilex But we could probably use an Educational Robotic Arm Thanks! Our program is one of the first Associates programs in AI in the country. Check out our program website. It is completely available online and we accept out of state students. https://www.waynecc.edu/programs/ai/ Feel free to ask any questions as well. *Also, advice on best value cloud computing virtualized resources for high performance computing for deep learning projects is appreciated. We are currently planning on going with Microsoft's Data Science Virtual Machines but not sure which series exactly submitted by /u/Wayne_CC_AI [link] [comments]  ( 1 min )
    PeopleLens AI Helps The Blind | Brain Fingerprints Detect Autism | AI Predicts Cancer Tumor Regrowth
    submitted by /u/getrich_or_diemining [link] [comments]
    Uhhhh
    submitted by /u/Blazeolmo [link] [comments]
    Dall-E 2 access
    Hello everyone, I wanted to know if one could get access to the dall-e 2 app without having to hope to be chosen from the wait list. Thanks! submitted by /u/Swaggyswaggerson [link] [comments]  ( 1 min )
    A brief history of deepfakes - how it started and where it might take us
    submitted by /u/much_successes [link] [comments]
    𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 (𝐀𝐈𝐨𝐓) - 𝐋𝐚𝐭𝐞𝐬𝐭 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲, 𝐅𝐮𝐭𝐮𝐫𝐞 𝐔𝐩𝐜𝐨𝐦𝐢𝐧𝐠 𝐓𝐫𝐞𝐧𝐝𝐬 𝐚𝐧𝐝 𝐅𝐨𝐫𝐞𝐜𝐚𝐬𝐭 𝐭𝐨 𝟐𝟎𝟐𝟖
    Our Latest research report on the Artificial Intelligence of Things (AIoT) market shows how that market is changing and how trends in demographics, business cycles, and microeconomics affect the Artificial Intelligence of Things (AIoT) market as a whole. Our study of the global Artificial Intelligence of Things (AIoT) market demonstrates what's happening with business state by looking at production value and key regions. The market report provides an entire analysis of sales volume, pricing analysis, revenue, the margin of profit, the expansion rate within the Artificial Intelligence of Things (AIoT) market. Get A Free Sample Report @ https://www.intelligencemarketreport.com/report-sample/573982 ​ \"𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 \" Key Information Extracted from the Report ​ Extensive information on factors estimated to affect the Market growth and market share during the forecast period is presented in the report. The report offers the present scenario and future growth prospects Market in various geographical regions. The competitive landscape analysis on the market as well as the qualitative and quantitative information is delivered. The SWOT analysis is conducted along with Porter's Five Force analysis. · The in-depth analysis provides an insight into the Market, underlining the growth rate and opportunities offered in the business. Leading Key Players Included In This Report Are: · Twilio Inc. · ShiftPixy Inc. · Micron Technology · Intel · IBM · Gopher Protocol · Deep Vision · Ceva · ALCES · AISPEECH submitted by /u/Purva_Duggal [link] [comments]
    Free Webinar on Automated CV pipelines | Video Predictions
    Automated CV Pipelines 4th part is open for registration. It will be covering some of the best practices for video-specific annotation tasks. If you are interested you can check out the details here! submitted by /u/WeekendClassic [link] [comments]
    Stairway to (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Ancient civs always burn (A.I. animation + some sound design)
    submitted by /u/nenomancer [link] [comments]
    Treasure Planet made with starryai
    submitted by /u/Losthel [link] [comments]
    Elon Musk's NEURALINK vs Bryan Johnson's KERNEL (No Surgery)
    submitted by /u/1024cities [link] [comments]  ( 1 min )
    Sentiment, cognitive distortions and market data time series - causal analysis for crypto markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    High Tech Hacks 2022 ! !
    Hey guys! I’m excited to share with you an exciting upcoming hackathon, High Tech Hacks 2.0! High Tech Hacks is a free, international 24-hour hackathon on May 21-22nd, 2022 open to all high schoolers hoping to learn a new coding skill, compete for awesome prizes, or work with other like-minded hackers. Let’s invent, create, and push the boundaries of technology (as much as we can at one hackathon)! What to expect: Last year, participants learned the basics of web development, Python, virtual reality, and how to make a Discord bot from current software engineers at Microsoft, Amazon, Twilio, other tech companies, and Columbia University SHPE. Thanks to our company sponsors, each participant last year received nearly $400 worth of free software and swag. Register to earn FREE swag (t-shirts, water bottles, stickers!) Network with other passionate STEM high school students from around the world! (Last year we had participants from 26 countries signed up already!) This year we have even bigger prizes, competitions, and speakers so stay tuned! Reach out to me with more questions or email [hightechhackathon@gmail.com](mailto:hightechhackathon@gmail.com). Happy hacking! :D Sign up here to confirm your interest and get on our mailing list: Click Here to Register! Also, meet other hackers by Joining our Discord! For more, Check out our Website submitted by /u/HighTechHacks [link] [comments]  ( 1 min )
    MyStyle: The Best AI Face Manipulation to Date!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
  • Open

    Amazon Rekognition introduces Streaming Video Events to provide real-time alerts on live video streams
    Today, AWS announced the general availability of Amazon Rekognition Streaming Video Events, a fully managed service for camera manufacturers and service providers that uses machine learning (ML) to detect objects such as people, pets, and packages in live video streams from connected cameras. Amazon Rekognition Streaming Video Events sends them a notification as soon as […]  ( 8 min )
    3xLOGIC uses Amazon Rekognition Streaming Video Events to provide intelligent video analytics on live video streams to monitoring agents
    3xLOGIC is a leader in commercial electronic security systems. They provide commercial security systems and managed video monitoring for businesses, hospitals, schools, and government agencies. Managed video monitoring is a critical component of a comprehensive security strategy for 3xLOGIC’s customers. With more than 50,000 active cameras in the field, video monitoring teams face a daily […]  ( 4 min )
    Abode uses Amazon Rekognition Streaming Video Events to provide real-time notifications to their smart home customers
    Abode Systems (Abode) offers homeowners a comprehensive suite of do-it-yourself home security solutions that can be set up in minutes and enables homeowners to keep their family and property safe. Since the company’s launch in 2015, in-camera motion detection sensors have played an essential part in Abode’s solution, enabling customers to receive notifications and monitor […]  ( 6 min )
    Pandas user-defined functions are now available in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can select and query data with just a few clicks, quickly transform data with over 300 built-in data transformations, and understand your data with built-in visualizations without writing any code. Additionally, […]  ( 4 min )
    How Searchmetrics uses Amazon SageMaker to automatically find relevant keywords and make their human analysts 20% faster
    Searchmetrics is a global provider of search data, software, and consulting solutions, helping customers turn search data into unique business insights. To date, Searchmetrics has helped more than 1,000 companies such as McKinsey & Company, Lowe’s, and AXA find an advantage in the hyper-competitive search landscape. In 2021, Searchmetrics turned to AWS to help with […]  ( 5 min )
    Identify paraphrased text with Hugging Face on Amazon SageMaker
    Identifying paraphrased text has business value in many use cases. For example, by identifying sentence paraphrases, a text summarization system could remove redundant information. Another application is to identify plagiarized documents. In this post, we fine-tune a Hugging Face transformer on Amazon SageMaker to identify paraphrased sentence pairs in a few steps. A truly robust […]  ( 10 min )
    How Moovit turns data into insights to help passengers avoid delays using Apache Airflow and Amazon SageMaker
    This is a guest post by Moovit’s Software and Cloud Architect, Sharon Dahan. Moovit, an Intel company, is a leading Mobility as a Service (MaaS) solutions provider and creator of the top urban mobility app. Moovit serves over 1.3 billion riders in 3,500 cities around the world. We help people everywhere get to their destination […]  ( 7 min )
  • Open

    How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX
    Featuring stunning visuals from futuristic interstellar worlds, including colossal sand creatures, Dune captivated audiences around the world. The sci-fi film picked up six Oscars last month at the 94th Academy Awards, including for Best Sound and Visual Effects. Adapted from Frank Herbert’s 1965 novel of the same name, Dune tells the story of Paul Atreides, Read article > The post How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX appeared first on NVIDIA Blog.  ( 4 min )
    Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday
    It’s a jam-packed GFN Thursday. This week brings the popular, free-to-play, action role-playing game Lost Ark to gamers across nearly all their devices, streaming on GeForce NOW. And that’s not all. GFN Thursday also delivers an upgraded experience in the 2.0.40 update. M1-based MacBooks, iMacs and Mac Minis are now supported natively. Plus, membership gift Read article > The post Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Top 11 Retail Technology Trends For 2022
    The 2019 pandemic became a catalyst for retail businesses and their customers. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 6 min )
  • Open

    How can we reduce the carbon footprint of global computing?
    Workshop hosted by MIT’s Climate and Sustainability Consortium, MIT-IBM Watson AI Lab, and the MIT Schwarzman College of Computing highlights how new approaches to computing can save energy and help the planet.  ( 8 min )
    Aging Brain Initiative awards fund five new ideas to study, fight neurodegeneration
    Competitive seed grants launch yearlong investigations of novel hypotheses about potential causes, biomarkers, treatments of Alzheimer’s and ALS.  ( 6 min )
  • Open

    Help to create a simple Neural Network... or find the bug
    Hello, I really need help creating a simple Neural Network that approximates the Rosenbrock function (a function with two variables). I have used the book "MATLAB Deep Learning" by Phil Kim, which provides code examples. However, when I make a simple example where the network is trained with backpropagation the output of the testing points simply comes out as ones. It is very early work so the code is very simple. I have just cut out the piece of code with Neural Network. Can someone give any tips or help me to figure out what is wrong? %% Neural Network % Rosenbrock function : f = (1 - x1).^2 + 100*(x2-x1.^2).^2 % X: 25x2 matrix % f_samp: 25x1 vector % Number of nodes in hidden layer nHNodes = 4 ; % Input layer has two nodes % Output layer has one node % Initial weigts W1 = 2*rand(nHNodes, k) - 1 ; W2 = 2*rand(1, nHNodes) - 1 ; alpha = 0.5 ; % Lerning rate % Backpropagation for i = 1:10000 for i = 1:size(X,1) x = X(i, :)'; d = f_samp(i); % Forward v1 = W1*x; y1 = fun_sigmoid(v1); v = W2*y1; y = fun_sigmoid(v); % Backward e = d - y; delta = y.*(1-y).*e; e1 = W2'*delta; delta1 = y1.*(1-y1).*e1; dW1 = alpha*delta1*x'; W1 = W1 + dW1; dW2 = alpha*delta*y1'; W2 = W2 + dW2; end end % Testing sampling points for i = 1:size(X,1) x = X(i, :)' ; v1 = W1*x ; y1 = fun_sigmoid(v1) ; v = W2*y1 ; y(i) = fun_sigmoid(v) ; end submitted by /u/TobiasFred [link] [comments]  ( 1 min )
    I need help making a neural network from scratch in python
    Here's my code: https://github.com/heyuhowudoin/mnist_ai I currently have 3 layers, 15, 15 and 10 neurons respectively. I'm using the MNIST database to get images of hand-drawn numbers and trying to classify them by which neuron in the final layer has the highest activation. I use a combination of sigmoid and ReLU as activation functions, and I am using minibatches of 100 pictures each. The thing I see happening is that it converges to a point where every neuron in the final layer outputs a 0 because then it has a relatively low error since the target for 9 out of 10 of those neurons is indeed 0, whereas the target for the correct output neuron is 1. I've tried looking over my backprop algorithm, I've tried changing the neurons in each hidden layer to 200, I've changed the learning speed a bunch but I can't seem to figure out why it's not working so if one of you guys wouldn't mind helping me, that would be much appreciated. submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
    AlphaGo Zero
    Hello, I have an alphago zero question. Why doesn’t alphago zero use Q(s,a) to choose its next move in the Monte Carlo tree search? Why does it use the π instead? submitted by /u/Skinnybisquit [link] [comments]
  • Open

    Take Your Machine Learning Skills Global
    Sponsored Post In our interconnected world, a decision made thousands of miles away can have lasting consequences for entire organizations or economies. When small changes have big effects, it is unsurprising that companies and governments are turning to machine learning and AI to accurately predict risk. ​ How the Global Community is Applying Machine Learning […] The post Take Your Machine Learning Skills Global appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models. (arXiv:2204.12584v1 [cs.RO])
    Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. For the deformable simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM). For the fluid simulation, we use a U-Net architecture trained with a physics-based loss function to predict the flow field at each time step. The pressure and velocity field outputs from the neural network are sampled around the boundary of our swimmer using an immersed boundary method (IBM) to compute its swimming motion accurately and efficiently. We demonstrate the computational efficiency and differentiability of our hybrid simulator on a 2D carangiform swimmer. Since both the solid simulator and the hydrodynamics model are automatically differentiable, we obtain a fully differentiable FSI simulator that can be used for computational co-design of geometry and controls for rigid and soft bodies immersed in fluids, such as minimizing drag, maximizing speed, or maximizing efficiency via direct gradient-based optimization.  ( 2 min )
    Zero-Touch Network on Industrial IoT: An End-to-End Machine Learning Approach. (arXiv:2204.12605v1 [cs.LG])
    Industry 4.0-enabled smart factory is expected to realize the next revolution for manufacturers. Although artificial intelligence (AI) technologies have improved productivity, current use cases belong to small-scale and single-task operations. To unbound the potential of smart factory, this paper develops zero-touch network systems for intelligent manufacturing and facilitates distributed AI applications in both training and inferring stages in a large-scale manner. The open radio access network (O-RAN) architecture is first introduced for the zero-touch platform to enable globally controlling communications and computation infrastructure capability in the field. The designed serverless framework allows intelligent and efficient learning assignments and resource allocations. Hence, requested learning tasks can be assigned to appropriate robots, and the underlying infrastructure can be used to support the learning tasks without expert knowledge. Moreover, due to the proposed network system's flexibility, powerful AI-enabled networking algorithms can be utilized to ensure service-level agreements and superior performances for factory workloads. Finally, three open research directions of backward compatibility, end-to-end enhancements, and cybersecurity are discussed for zero-touch smart factory.  ( 2 min )
    Data-driven detector signal characterization with constrained bottleneck autoencoders. (arXiv:2203.04604v4 [physics.ins-det] UPDATED)
    A common technique in high energy physics is to characterize the response of a detector by means of models tunned to data which build parametric maps from the physical parameters of the system to the expected signal of the detector. When the underlying model is unknown it is difficult to apply this method, and often, simplifying assumptions are made introducing modeling errors. In this article, using a waveform toy model we present how deep learning in the form of constrained bottleneck autoencoders can be used to learn the underlying unknown detector response model directly from data. The results show that excellent performance results can be achieved even when the signals are significantly affected by random noise. The trained algorithm can be used simultaneously to perform estimations on the physical parameters of the model, simulate the detector response with high fidelity and to denoise detector signals.  ( 2 min )
    A Survey on Machine Learning Approaches for Modelling Intuitive Physics. (arXiv:2202.06481v2 [cs.LG] UPDATED)
    Research in cognitive science has provided extensive evidence of human cognitive ability in performing physical reasoning of objects from noisy perceptual inputs. Such a cognitive ability is commonly known as intuitive physics. With advancements in deep learning, there is an increasing interest in building intelligent systems that are capable of performing physical reasoning from a given scene for the purpose of building better AI systems. As a result, many contemporary approaches in modelling intuitive physics for machine cognition have been inspired by literature from cognitive science. Despite the wide range of work in physical reasoning for machine cognition, there is a scarcity of reviews that organize and group these deep learning approaches. Especially at the intersection of intuitive physics and artificial intelligence, there is a need to make sense of the diverse range of ideas and approaches. Therefore, this paper presents a comprehensive survey of recent advances and techniques in intuitive physics-inspired deep learning approaches for physical reasoning. The survey will first categorize existing deep learning approaches into three facets of physical reasoning before organizing them into three general technical approaches and propose six categorical tasks of the field. Finally, we highlight the challenges of the current field and present some future research directions.  ( 2 min )
    Generating Examples From CLI Usage: Can Transformers Help?. (arXiv:2204.12648v1 [cs.SE])
    Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to regularly turn to other resources on the web such as StackOverflow for examples to guide them in writing software. We recognize that this inconvenient, error-prone, and expensive process can be improved by using machine learning applied to software usage data. In this paper, we present our practical system which uses machine learning on large-scale telemetry data and documentation corpora, generating appropriate and complex examples that can be used to improve documentation. We discuss both feature-based and transformer-based machine learning approaches and demonstrate that our system achieves 100% coverage for the used functionalities in the product, providing up-to-date examples upon every release and reduces the numbers of PRs submitted by software owners writing and editing documentation by >68%. We also share valuable lessons learnt during the 3 years that our production quality system has been deployed for Azure Cloud Command Line Interface (Azure CLI).  ( 2 min )
    Generating Self-Serendipity Preference in Recommender Systems for Addressing Cold Start Problems. (arXiv:2204.12651v1 [cs.IR])
    Classical accuracy-oriented Recommender Systems (RSs) typically face the cold-start problem and the filter-bubble problem when users suffer the familiar, repeated, and even predictable recommendations, making them boring and unsatisfied. To address the above issues, serendipity-oriented RSs are proposed to recommend appealing and valuable items significantly deviating from users' historical interactions and thus satisfying them by introducing unexplored but relevant candidate items to them. In this paper, we devise a novel serendipity-oriented recommender system (\textbf{G}enerative \textbf{S}elf-\textbf{S}erendipity \textbf{R}ecommender \textbf{S}ystem, \textbf{GS$^2$-RS}) that generates users' self-serendipity preferences to enhance the recommendation performance. Specifically, this model extracts users' interest and satisfaction preferences, generates virtual but convincible neighbors' preferences from themselves, and achieves their self-serendipity preference. Then these preferences are injected into the rating matrix as additional information for RS models. Note that GS$^2$-RS can not only tackle the cold-start problem but also provides diverse but relevant recommendations to relieve the filter-bubble problem. Extensive experiments on benchmark datasets illustrate that the proposed GS$^2$-RS model can significantly outperform the state-of-the-art baseline approaches in serendipity measures with a stable accuracy performance.  ( 2 min )
    AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands. (arXiv:2112.13168v4 [q-bio.QM] UPDATED)
    Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via auto-docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.  ( 2 min )
    Trusted Multi-View Classification with Dynamic Evidential Fusion. (arXiv:2204.11423v2 [cs.LG] UPDATED)
    Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.  ( 2 min )
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.  ( 2 min )
    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models. (arXiv:2104.05158v6 [cs.DC] UPDATED)
    Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    Performer: A Novel PPG to ECG Reconstruction Transformer For a Digital Biomarker of Cardiovascular Disease Detection. (arXiv:2204.11795v2 [eess.SP] UPDATED)
    Cardiovascular diseases (CVDs) have become the top one cause of death; three-quarters of these deaths occur in lower-income communities. Electrocardiography (ECG), an electrical measurement capturing the cardiac activities, is a gold-standard to diagnose CVDs. However, ECG is infeasible for continuous cardiac monitoring due to its requirement for user participation. Meanwhile, photoplethysmography (PPG) is easy to collect, but the limited accuracy constrains its clinical usage. In this research, a novel Transformer-based architecture, Performer, is invented to reconstruct ECG from PPG and to create a novel digital biomarker, PPG along with its reconstructed ECG, as multiple modalities for CVD detection. This architecture, for the first time, performs Transformer sequence to sequence translation on biomedical waveforms, while also utilizing the advantages of the easily accessible PPG and the well-studied base of ECG. Shifted Patch-based Attention (SPA) is created to maximize the signal features by fetching the various sequence lengths as hierarchical stages into the training while also capturing cross-patch connections through the shifted patch mechanism. This architecture generates a state-of-the-art performance of 0.29 RMSE for reconstructing ECG from PPG, achieving an average of 95.9% diagnosis for CVDs on the MIMIC III dataset and 75.9% for diabetes on the PPG-BP dataset. Performer, along with its novel digital biomarker, offers a low-cost and non-invasive solution for continuous cardiac monitoring, only requiring the easily extractable PPG data to reconstruct the not-as-accessible ECG data. As a prove of concept, an earring wearable, named PEARL (prototype), is designed to scale up the point-of-care (POC) healthcare system.  ( 2 min )
    An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models. (arXiv:2204.11351v2 [cs.LG] UPDATED)
    Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.  ( 2 min )
    Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines. (arXiv:2204.11131v2 [cs.LG] UPDATED)
    Developing modern machine learning (ML) applications is data-centric, of which one fundamental challenge is to understand the influence of data quality to ML training -- "Which training examples are 'guilty' in making the trained ML model predictions inaccurate or unfair?" Modeling data influence for ML training has attracted intensive interest over the last decade, and one popular framework is to compute the Shapley value of each training example with respect to utilities such as validation accuracy and fairness of the trained ML model. Unfortunately, despite recent intensive interest and research, existing methods only consider a single ML model "in isolation" and do not consider an end-to-end ML pipeline that consists of data transformations, feature extractors, and ML training. We present DataScope (ease.ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training. To this end, we first develop a novel algorithmic framework that computes Shapley value over a specific family of ML pipelines that we call canonical pipelines: a positive relational algebra query followed by a K-nearest-neighbor (KNN) classifier. We show that, for many subfamilies of canonical pipelines, computing Shapley value is in PTIME, contrasting the exponential complexity of computing Shapley value in general. We then put this to practice -- given an sklearn pipeline, we approximate it with a canonical pipeline to use as a proxy. We conduct extensive experiments illustrating different use cases and utilities. Our results show that DataScope is up to four orders of magnitude faster over state-of-the-art Monte Carlo-based methods, while being comparably, and often even more, effective in data debugging.  ( 2 min )
    Reinforced Causal Explainer for Graph Neural Networks. (arXiv:2204.11028v2 [cs.LG] UPDATED)
    Explainability is crucial for probing graph neural networks (GNNs), answering questions like "Why the GNN model makes a certain prediction?". Feature attribution is a prevalent technique of highlighting the explanatory subgraph in the input graph, which plausibly leads the GNN model to make its prediction. Various attribution methods exploit gradient-like or attention scores as the attributions of edges, then select the salient edges with top attribution scores as the explanation. However, most of these works make an untenable assumption - the selected edges are linearly independent - thus leaving the dependencies among edges largely unexplored, especially their coalition effect. We demonstrate unambiguous drawbacks of this assumption - making the explanatory subgraph unfaithful and verbose. To address this challenge, we propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer). It frames the explanation task as a sequential decision process - an explanatory subgraph is successively constructed by adding a salient edge to connect the previously selected subgraph. Technically, its policy network predicts the action of edge addition, and gets a reward that quantifies the action's causal effect on the prediction. Such reward accounts for the dependency of the newly-added edge and the previously-added edges, thus reflecting whether they collaborate together and form a coalition to pursue better explanations. As such, RC-Explainer is able to generate faithful and concise explanations, and has a better generalization power to unseen graphs. When explaining different GNNs on three graph classification datasets, RC-Explainer achieves better or comparable performance to SOTA approaches w.r.t. predictive accuracy and contrastivity, and safely passes sanity checks and visual inspections. Codes are available at https://github.com/xiangwang1223/reinforced_causal_explainer.  ( 2 min )
    Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention. (arXiv:2204.11008v2 [cs.LG] UPDATED)
    Many real-world ubiquitous applications, such as parking recommendations and air pollution monitoring, benefit significantly from accurate long-term spatio-temporal forecasting (LSTF). LSTF makes use of long-term dependency between spatial and temporal domains, contextual information, and inherent pattern in the data. Recent studies have revealed the potential of multi-graph neural networks (MGNNs) to improve prediction performance. However, existing MGNN methods cannot be directly applied to LSTF due to several issues: the low level of generality, insufficient use of contextual information, and the imbalanced graph fusion approach. To address these issues, we construct new graph models to represent the contextual information of each node and the long-term spatio-temporal data dependency structure. To fuse the information across multiple graphs, we propose a new dynamic multi-graph fusion module to characterize the correlations of nodes within a graph and the nodes across graphs via the spatial attention and graph attention mechanisms. Furthermore, we introduce a trainable weight tensor to indicate the importance of each node in different graphs. Extensive experiments on two large-scale datasets demonstrate that our proposed approaches significantly improve the performance of existing graph neural network models in LSTF prediction tasks.  ( 2 min )
    Sublinear Time Approximation of Text Similarity Matrices. (arXiv:2112.09631v3 [cs.LG] UPDATED)
    We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr\"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.  ( 2 min )
    AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees. (arXiv:2201.07984v2 [cs.AI] UPDATED)
    Using the pre-trained language model (i.e. BERT) to apprehend source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to directly solve programming language (PL) related problems. To this end, we propose the AstBERT model, a pre-trained language model aiming to better understand the financial PL using the abstract syntax tree (AST). Specifically, we collect a colossal amount of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three downstream tasks.  ( 2 min )
    ROMNet: Renovate the Old Memories. (arXiv:2202.02606v2 [eess.IV] UPDATED)
    Renovating the memories in old photos is an intriguing research topic in computer vision fields. These legacy images often suffer from severe and commingled degradations such as cracks, noise, and color-fading, while lack of large-scale paired old photo datasets makes this restoration task very challenging. In this work, we present a novel reference-based end-to-end learning framework that can jointly repair and colorize the degraded legacy pictures. Specifically, the proposed framework consists of three modules: a restoration sub-network for degradation restoration, a similarity sub-network for color histogram matching and transfer, and a colorization subnet that learns to predict the chroma elements of the images conditioned on chromatic reference signals. The whole system takes advantage of the color histogram priors in a given reference image, which vastly reduces the dependency on large-scale training data. Apart from the proposed method, we also create, to our knowledge, the first public and real-world old photo dataset with paired ground truth for evaluating old photo restoration models, wherein each old photo is paired with a manually restored pristine image by PhotoShop experts. Our extensive experiments conducted on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-arts both quantitatively and qualitatively.  ( 2 min )
    LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval. (arXiv:2203.06169v2 [cs.CL] UPDATED)
    In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.  ( 2 min )
    Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning. (arXiv:2112.11663v6 [cs.LG] UPDATED)
    Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the existing studies show that it suffers from a high computation complexity in nonconvex minimax optimization. In this paper, we develop a single-loop and fast AltGDA-type algorithm that leverages proximal gradient updates and momentum acceleration to solve regularized nonconvex minimax optimization problems. By leveraging the momentum acceleration technique, we prove that the algorithm converges to a critical point in nonconvex minimax optimization and achieves a computation complexity in the order of $\mathcal{O}(\kappa^{\frac{11}{6}}\epsilon^{-2})$, where $\epsilon$ is the desired level of accuracy and $\kappa$ is the problem's condition number. {Such a computation complexity improves the state-of-the-art complexities of single-loop GDA and AltGDA algorithms (see the summary of comparison in \Cref{table1})}. We demonstrate the effectiveness of our algorithm via an experiment on adversarial deep learning.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    GNMR: A provable one-line algorithm for low rank matrix recovery. (arXiv:2106.12933v3 [math.OC] UPDATED)
    Low rank matrix recovery problems, including matrix completion and matrix sensing, appear in a broad range of applications. In this work we present GNMR -- an extremely simple iterative algorithm for low rank matrix recovery, based on a Gauss-Newton linearization. On the theoretical front, we derive recovery guarantees for GNMR in both the matrix sensing and matrix completion settings. Some of these results improve upon the best currently known for other methods. A key property of GNMR is that it implicitly keeps the factor matrices approximately balanced throughout its iterations. On the empirical front, we show that for matrix completion with uniform sampling, GNMR performs better than several popular methods, especially when given very few observations close to the information limit.  ( 2 min )
    Differentially Private SGDA for Minimax Problems. (arXiv:2201.09046v3 [cs.LG] UPDATED)
    Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.  ( 2 min )
    ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection. (arXiv:2202.00232v3 [eess.IV] UPDATED)
    This work proposes a novel deep neural network (DNN) architecture, Implicit Segmentation Neural Network (ISNet), to solve the task of image segmentation followed by classification. It substitutes the common pipeline of two DNNs with a single model. We designed the ISNet for high flexibility and performance: it allows virtually any classification neural network architecture to analyze a common image as if it had been previously segmented. Furthermore, in relation to the unmodified classifier, the ISNet does not cause any increment in computational cost at run-time. We test the architecture with two applications: COVID-19 detection in chest X-rays, and facial attribute estimation. We implement an ISNet based on a DenseNet121 classifier, and compare the model to a U-net (performing lung/face segmentation) followed by a DenseNet121, and to a standalone DenseNet121. The new architecture matched the other DNNs in facial attribute estimation. Moreover, it strongly surpassed them in COVID-19 detection, according to an external test dataset. The ISNet precisely ignored the image regions outside of the lungs or faces. Therefore, in COVID-19 detection it reduced the effects of background bias and shortcut learning, and it improved security in facial attribute estimation. ISNet presents an accurate, fast, and light methodology. The successful implicit segmentation, considering two largely diverse fields, highlights the architecture's general applicability.  ( 3 min )
    Variational Learning for Unsupervised Knowledge Grounded Dialogs. (arXiv:2112.00653v3 [cs.CL] UPDATED)
    Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG and REALM, marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.  ( 2 min )
    Single-pass Object-adaptive Data Undersampling and Reconstruction for MRI. (arXiv:2111.09212v2 [eess.IV] UPDATED)
    There is much recent interest in techniques to accelerate the data acquisition process in MRI by acquiring limited measurements. Often sophisticated reconstruction algorithms are deployed to maintain high image quality in such settings. In this work, we propose a data-driven sampler using a convolutional neural network, MNet, to provide object-specific sampling patterns adaptive to each scanned object. The network observes very limited low-frequency k-space data for each object and rapidly predicts the desired undersampling pattern in one go that achieves high image reconstruction quality. We propose an accompanying alternating-type training framework with a mask-backward procedure that efficiently generates training labels for the sampler network and jointly trains an image reconstruction network. Experimental results on the fastMRI knee dataset demonstrate the ability of the proposed learned undersampling network to generate object-specific masks at fourfold and eightfold acceleration that achieve superior image reconstruction performance than several existing schemes. The source code for the proposed joint sampling and reconstruction learning framework is available at https://github.com/zhishenhuang/mri.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Building separable approximations for quantum states via neural networks. (arXiv:2112.08055v4 [quant-ph] UPDATED)
    Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert--Schmidt distance. By examining the output of the algorithm, we obtain an upper bound on the entanglement of the target state, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of $d=10$, while providing conjectures and analytic insight for isotropic and Werner states. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain novel ones, for instance for triseparability.  ( 2 min )
    Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis. (arXiv:2111.15186v2 [cs.LG] UPDATED)
    Obtaining annotations for large training sets is expensive, especially in settings where domain knowledge is required, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak labels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting scalability. To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. The key to our approach is to efficiently represent expert knowledge in a reusable domain-specific language and more general domain-level LFs, with which we use state-of-the-art program synthesis techniques and a small labeled dataset to generate task-level LFs. Additionally, we propose a novel structural diversity cost that allows for efficient synthesis of diverse sets of LFs, further improving AutoSWAP's performance. We evaluate AutoSWAP in three behavior analysis domains and demonstrate that AutoSWAP outperforms existing approaches using only a fraction of the data. Our results suggest that AutoSWAP is an effective way to automatically generate LFs that can significantly reduce expert effort for behavior analysis.  ( 2 min )
    Regularized Newton Method with Global $O(1/k^2)$ Convergence. (arXiv:2112.02089v2 [math.OC] UPDATED)
    We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by $x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.  ( 2 min )
    Physics-Driven Learning of Wasserstein GAN for Density Reconstruction in Dynamic Tomography. (arXiv:2110.15424v2 [eess.IV] UPDATED)
    Object density reconstruction from projections containing scattered radiation and noise is of critical importance in many applications. Existing scatter correction and density reconstruction methods may not provide the high accuracy needed in many applications and can break down in the presence of unmodeled or anomalous scatter and other experimental artifacts. Incorporating machine-learned models could prove beneficial for accurate density reconstruction particularly in dynamic imaging, where the time-evolution of the density fields could be captured by partial differential equations or by learning from hydrodynamics simulations. In this work, we demonstrate the ability of learned deep neural networks to perform artifact removal in noisy density reconstructions, where the noise is imperfectly characterized. We use a Wasserstein generative adversarial network (WGAN), where the generator serves as a denoiser that removes artifacts in densities obtained from traditional reconstruction algorithms. We train the networks from large density time-series datasets, with noise simulated according to parametric random distributions that may mimic noise in experiments. The WGAN is trained with noisy density frames as generator inputs, to match the generator outputs to the distribution of clean densities (time-series) from simulations. A supervised loss is also included in the training, which leads to improved density restoration performance. In addition, we employ physics-based constraints such as mass conservation during network training and application to further enable highly accurate density reconstructions. Our preliminary numerical results show that the models trained in our frameworks can remove significant portions of unknown noise in density time-series data.  ( 2 min )
    A Study of Fake News Reading and Annotating in Social Media Context. (arXiv:2109.12523v2 [cs.HC] UPDATED)
    The online spreading of fake news is a major issue threatening entire societies. Much of this spreading is enabled by new media formats, namely social networks and online media sites. Researchers and practitioners have been trying to answer this by characterizing the fake news and devising automated methods for detecting them. The detection methods had so far only limited success, mostly due to the complexity of the news content and context and lack of properly annotated datasets. One possible way to boost the efficiency of automated misinformation detection methods, is to imitate the detection work of humans. It is also important to understand the news consumption behavior of online users. In this paper, we present an eye-tracking study, in which we let 44 lay participants to casually read through a social media feed containing posts with news articles, some of which were fake. In a second run, we asked the participants to decide on the truthfulness of these articles. We also describe a follow-up qualitative study with a similar scenario but this time with 7 expert fake news annotators. We present the description of both studies, characteristics of the resulting dataset (which we hereby publish) and several findings.  ( 2 min )
    Encoding Involutory Invariances in Neural Networks. (arXiv:2106.12891v2 [cs.LG] UPDATED)
    In certain situations, neural networks are trained upon data that obey underlying symmetries. However, the predictions do not respect the symmetries exactly unless embedded in the network structure. In this work, we introduce architectures that embed a special kind of symmetry namely, invariance with respect to involutory linear/affine transformations up to parity $p=\pm 1$. We provide rigorous theorems to show that the proposed network ensures such an invariance and present qualitative arguments for a special universal approximation theorem. An adaption of our techniques to CNN tasks for datasets with inherent horizontal/vertical reflection symmetry is demonstrated. Extensive experiments indicate that the proposed model outperforms baseline feed-forward and physics-informed neural networks while identically respecting the underlying symmetry.  ( 2 min )
    Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget. (arXiv:2106.15808v2 [cs.LG] UPDATED)
    In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function. Given the historical daily cases in a region and the past intervention plans in place, the agent should generate useful intervention plans that policy makers can implement in real time to minimizing both the number of daily COVID-19 cases and the stringency of the recommended interventions. We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem.  ( 2 min )
    Leveraging power grid topology in machine learning assisted optimal power flow. (arXiv:2110.00306v3 [cs.LG] UPDATED)
    Machine learning assisted optimal power flow (OPF) aims to reduce the computational complexity of these non-linear and non-convex constrained optimization problems by consigning expensive (online) optimization to offline training. The majority of work in this area typically employs fully connected neural networks (FCNN). However, recently convolutional (CNN) and graph (GNN) neural networks have also been investigated, in effort to exploit topological information within the power grid. Although promising results have been obtained, there lacks a systematic comparison between these architectures throughout literature. Accordingly, we introduce a concise framework for generalizing methods for machine learning assisted OPF and assess the performance of a variety of FCNN, CNN and GNN models for two fundamental approaches in this domain: regression (predicting optimal generator set-points) and classification (predicting the active set of constraints). For several synthetic power grids with interconnected utilities, we show that locality properties between feature and target variables are scarce and subsequently demonstrate marginal utility of applying CNN and GNN architectures compared to FCNN for a fixed grid topology. However, with variable topology (for instance, modeling transmission line contingency), GNN models are able to straightforwardly take the change of topological information into account and outperform both FCNN and CNN models.  ( 2 min )
    Maximum Entropy Dueling Network Architecture in Atari Domain. (arXiv:2107.14457v2 [cs.LG] UPDATED)
    In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.  ( 2 min )
    Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future. (arXiv:2106.04420v8 [cs.LG] UPDATED)
    In real-time forecasting in public health, data collection is a non-trivial and demanding task. Often after initially released, it undergoes several revisions later (maybe due to human or technical constraints) - as a result, it may take weeks until the data reaches to a stable value. This so-called 'backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. In this paper, we introduce the multi-variate backfill problem using COVID-19 as the motivating example. We construct a detailed dataset composed of relevant signals over the past year of the pandemic. We then systematically characterize several patterns in backfill dynamics and leverage our observations for formulating a novel problem and neural framework Back2Future that aims to refines a given model's predictions in real-time. Our extensive experiments demonstrate that our method refines the performance of top models for COVID-19 forecasting, in contrast to non-trivial baselines, yielding 18% improvement over baselines, enabling us obtain a new SOTA performance. In addition, we show that our model improves model evaluation too; hence policy-makers can better understand the true accuracy of forecasting models in real-time.  ( 3 min )
    SALIENCE: An Unsupervised User Adaptation Model for Multiple Wearable Sensors Based Human Activity Recognition. (arXiv:2108.10213v2 [eess.SP] UPDATED)
    Unsupervised user adaptation aligns the feature distributions of the data from training users and the new user, so a well-trained wearable human activity recognition (WHAR) model can be well adapted to the new user. With the development of wearable sensors, multiple wearable sensors based WHAR is gaining more and more attention. In order to address the challenge that the transferabilities of different sensors are different, we propose SALIENCE (unsupervised user adaptation model for multiple wearable sensors based human activity recognition) model. It aligns the data of each sensor separately to achieve local alignment, while uniformly aligning the data of all sensors to ensure global alignment. In addition, an attention mechanism is proposed to focus the activity classifier of SALIENCE on the sensors with strong feature discrimination and well distribution alignment. Experiments are conducted on two public WHAR datasets, and the experimental results show that our model can yield a competitive performance.  ( 2 min )
    SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space. (arXiv:2008.00397v2 [cs.CV] UPDATED)
    In this work, we formulate a visual dialog as an information flow in which each piece of information is encoded with the joint visual-linguistic representation of a single dialog round. Based on this formulation, we consider the visual dialog task as a sequence problem consisting of ordered visual-linguistic vectors. For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation and data efficiencies. For inference, we propose two Sequential Dialog Networks (SeqDialN): the first uses LSTM for information propagation (IP) and the second uses a modified Transformer for multi-step reasoning (MR). Our architecture separates the complexity of multimodal feature fusion from that of inference, which allows simpler design of the inference engine. IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance. MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog task. On VisDial v1.0 test-std dataset, our best single generative SeqDialN achieves 62.54% NDCG and 48.63% MRR; our ensemble generative SeqDialN achieves 63.78% NDCG and 49.98% MRR, which set a new state-of-the-art generative visual dialog model. We fine-tune discriminative SeqDialN with dense annotations and boost the performance up to 72.41% NDCG and 55.11% MRR. In this work, we discuss the extensive experiments we have conducted to demonstrate the effectiveness of our model components. We also provide visualization for the reasoning process from the relevant conversation rounds and discuss our fine-tuning methods. Our code is available at https://github.com/xiaoxiaoheimei/SeqDialN  ( 3 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Unify Local and Global Information for Top-$N$ Recommendation. (arXiv:2012.01635v2 [cs.IR] UPDATED)
    Knowledge graph (KG), integrating complex information and containing rich semantics, is widely considered as side information to enhance the recommendation systems. However, most of the existing KG-based methods concentrate on encoding the structural information in the graph, without utilizing the collaborative signals in user-item interaction data, which are important for understanding user preferences. Therefore, the representations learned by these models are insufficient for representing semantic information of users and items in the recommendation environment. The combination of both kinds of data provides a good chance to solve this problem. To tackle this research gap, we propose a novel duet representation learning framework named \sysname to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-$N$ recommendation, which is composed of two separate sub-models. One learns the local representations by discovering the inner correlations in local information with a knowledge-aware co-attention mechanism, and another learns the global representations by encoding the knowledge associations in global information with a relation-aware attention network. The two sub-models are jointly trained as part of the semantic fusion network to compute the user preferences, which discriminates the contribution of the two sub-models under the special context. We conduct experiments on two real-world datasets, and the evaluations show that KADM significantly outperforms state-of-art methods. Further ablation studies confirm that the duet architecture performs significantly better than either sub-model on the recommendation tasks.  ( 2 min )
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Exoskeleton-Based Multimodal Action and Movement Recognition: Identifying and Developing the Optimal Boosted Learning Approach. (arXiv:2106.10331v2 [cs.RO] UPDATED)
    This paper makes two scientific contributions to the field of exoskeleton-based action and movement recognition. First, it presents a novel machine learning and pattern recognition-based framework that can detect a wide range of actions and movements - walking, walking upstairs, walking downstairs, sitting, standing, lying, stand to sit, sit to stand, sit to lie, lie to sit, stand to lie, and lie to stand, with an overall accuracy of 82.63%. Second, it presents a comprehensive comparative study of different learning approaches - Random Forest, Artificial Neural Network, Decision Tree, Multiway Decision Tree, Support Vector Machine, k-NN, Gradient Boosted Trees, Decision Stump, AutoMLP, Linear Regression, Vector Linear Regression, Random Tree, Na\"ive Bayes, Na\"ive Bayes (Kernel), Linear Discriminant Analysis, Quadratic Discriminant Analysis, and Deep Learning applied to this framework. The performance of each of these learning approaches was boosted by using the AdaBoost algorithm, and the Cross Validation approach was used for training and testing. The results show that in boosted form, the k-NN classifier outperforms all the other boosted learning approaches and is, therefore, the optimal learning method for this purpose. The results presented and discussed uphold the importance of this work to contribute towards augmenting the abilities of exoskeleton-based assisted and independent living of the elderly in the future of Internet of Things-based living environments, such as Smart Homes. As a specific use case, we also discuss how the findings of our work are relevant for augmenting the capabilities of the Hybrid Assistive Limb exoskeleton, a highly functional lower limb exoskeleton.  ( 2 min )
    Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images. (arXiv:2106.10070v2 [cs.CV] UPDATED)
    This paper is concerned with contrastive learning (CL) for low-level image restoration and enhancement tasks. We propose a new label-efficient learning paradigm based on residuals, residual contrastive learning (RCL), and derive an unsupervised visual representation learning framework, suitable for low-level vision tasks with noisy inputs. While supervised image reconstruction aims to minimize residual terms directly, RCL alternatively builds a connection between residuals and CL by defining a novel instance discrimination pretext task, using residuals as the discriminative feature. Our formulation mitigates the severe task misalignment between instance discrimination pretext tasks and downstream image reconstruction tasks, present in existing CL frameworks. Experimentally, we find that RCL can learn robust and transferable representations that improve the performance of various downstream tasks, such as denoising and super resolution, in comparison with recent self-supervised methods designed specifically for noisy inputs. Additionally, our unsupervised pre-training can significantly reduce annotation costs whilst maintaining performance competitive with fully-supervised image reconstruction.  ( 2 min )
    Neural String Edit Distance. (arXiv:2104.08388v2 [cs.CL] UPDATED)
    We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We evaluate on cognate detection, transliteration, and grapheme-to-phoneme conversion, and show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we match the performance of state-of-the-art string-pair matching models. Using static embeddings and a slightly different loss function, we force interpretability, at the expense of an accuracy drop.  ( 2 min )
    Federated Reconstruction: Partially Local Federated Learning. (arXiv:2102.03448v6 [cs.LG] UPDATED)
    Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.  ( 2 min )
    Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification. (arXiv:2012.07437v2 [cs.LG] UPDATED)
    Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains under-explored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.  ( 2 min )
    Towards assessing agricultural land suitability with causal machine learning. (arXiv:2204.12956v1 [cs.LG])
    Understanding the suitability of agricultural land for applying specific management practices is of great importance for sustainable and resilient agriculture against climate change. Recent developments in the field of causal machine learning enable the estimation of intervention impacts on an outcome of interest, for samples described by a set of observed characteristics. We introduce an extensible data-driven framework that leverages earth observations and frames agricultural land suitability as a geospatial impact assessment problem, where the estimated effects of agricultural practices on agroecosystems serve as a land suitability score and guide decision making. We formulate this as a causal machine learning task and discuss how this approach can be used for agricultural planning in a changing climate. Specifically, we extract the agricultural management practices of "crop rotation" and "landscape crop diversity" from crop type maps, account for climate and land use data, and use double machine learning to estimate their heterogeneous effect on Net Primary Productivity (NPP), within the Flanders region of Belgium from 2010 to 2020. We find that the effect of crop rotation was insignificant, while landscape crop diversity had a small negative effect on NPP. Finally, we observe considerable effect heterogeneity in space for both practices and analyze it.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Network Classification Based Structural Analysis of Real Networks and their Model-Generated Counterparts. (arXiv:1810.08498v4 [cs.SI] UPDATED)
    Data-driven analysis of complex networks has been in the focus of research for decades. An important area of research is to study how well real networks can be described with a small selection of metrics, furthermore how well network models can capture the relations between graph metrics observed in real networks. In this paper, we apply machine learning techniques to investigate the aforementioned problems. We study 500 real-world networks along with 2,000 synthetic networks generated by four frequently used network models with previously calibrated parameters to make the generated graphs as similar to the real networks as possible. This paper unifies several branches of data-driven complex network analysis, such as the study of graph metrics and their pair-wise relationships, network similarity estimation, model calibration, and graph classification. We find that the correlation profiles of the structural measures significantly differ across network domains and the domain can be efficiently determined using a small selection of graph metrics. The structural properties of the network models with fixed parameters are robust enough to perform parameter calibration. The goodness-of-fit of the network models highly depends on the network domain. By solving classification problems, we find that the models lack the capability of generating a graph with a high clustering coefficient and relatively large diameter simultaneously. On the other hand, models are able to capture exactly the degree-distribution-related metrics.  ( 2 min )
    Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump. (arXiv:2204.12929v1 [q-fin.ST])
    As the pump-and-dump schemes (P&Ds) proliferate in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance, to inform potentially susceptible investors before they become victims. In this paper, we focus on the target coin prediction task, i.e., to predict the pump probability of all coins listed in the target exchange before a pump. We conduct a comprehensive study of the latest P&Ds, investigate 709 events organized in Telegram channels from Jan. 2019 to Jan. 2022, and unearth some abnormal yet interesting patterns of P&Ds. Empirical analysis demonstrates that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity, which inspires us to develop a novel sequence-based neural network named SNN. Specifically, SNN encodes each channel's pump history as a sequence representation via a positional attention mechanism, which filters useful information and alleviates the noise introduced when the sequence length is long. We also identify and address the coin-side cold-start problem in a practical setting. Extensive experiments show a lift of 1.6% AUC and 41.0% Hit Ratio@3 brought by our method, making it well-suited for real-world application. As a side contribution, we release the source code of our entire data science pipeline on GitHub, along with the dataset tailored for studying the latest P&Ds.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    An Iterative Labeling Method for Annotating Fisheries Imagery. (arXiv:2204.12934v1 [cs.LG])
    In this paper, we present a methodology for fisheries-related data that allows us to converge on a labeled image dataset by iterating over the dataset with multiple training and production loops that can exploit crowdsourcing interfaces. We present our algorithm and its results on two separate sets of image data collected using the Seabed autonomous underwater vehicle. The first dataset comprises of 2,026 completely unlabeled images, while the second consists of 21,968 images that were point annotated by experts. Our results indicate that training with a small subset and iterating on that to build a larger set of labeled data allows us to converge to a fully annotated dataset with a small number of iterations. Even in the case of a dataset labeled by experts, a single iteration of the methodology improves the labels by discovering additional complicated examples of labels associated with fish that overlap, are very small, or obscured by the contrast limitations associated with underwater imagery.  ( 2 min )
    Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?. (arXiv:2204.13065v1 [cs.HC])
    Crowdsourcing is an online outsourcing mode which can solve the current machine learning algorithm's urge need for massive labeled data. Requester posts tasks on crowdsourcing platforms, which employ online workers over the Internet to complete tasks, then aggregate and return results to requester. How to model the interaction between different types of workers and tasks is a hot spot. In this paper, we try to model workers as four types based on their ability: expert, normal worker, sloppy worker and spammer, and divide tasks into hard, medium and easy task according to their difficulty. We believe that even experts struggle with difficult tasks while sloppy workers can get easy tasks right, and spammers always give out wrong answers deliberately. So, good examination tasks should have moderate degree of difficulty and discriminability to score workers more objectively. Thus, we first score workers' ability mainly on the medium difficult tasks, then reducing the weight of answers from sloppy workers and modifying the answers from spammers when inferring the tasks' ground truth. A probability graph model is adopted to simulate the task execution process, and an iterative method is adopted to calculate and update the ground truth, the ability of workers and the difficulty of the task successively. We verify the rightness and effectiveness of our algorithm both in simulated and real crowdsourcing scenes.  ( 2 min )
    NFT Appraisal Prediction: Utilizing Search Trends, Public Market Data, Linear Regression and Recurrent Neural Networks. (arXiv:2204.12932v1 [q-fin.ST])
    In this paper we investigate the correlation between NFT valuations and various features from three primary categories: public market data, NFT metadata, and social trends data.  ( 2 min )
    Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning. (arXiv:2204.13060v1 [cs.LG])
    Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent with an example of an analogous task, and have the agent then infer what the goal should be for its current state. We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/view/gc-bisimulation.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Can deep learning match the efficiency of human visual long-term memory to store object details?. (arXiv:2204.13061v1 [cs.LG])
    Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent appear to require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be enough for the model to reach human-level memory efficiency.  ( 2 min )
    NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue. (arXiv:2204.13021v1 [cs.CL])
    We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. \textbf{1)} NLU++ provides fine-grained domain ontologies with a large set of challenging \textit{multi-intent} sentences, introducing and validating the idea of \textit{intent modules} that can be combined into complex intents that convey complex user goals, combined with finer-grained and thus more challenging slot sets. \textbf{2)} The ontology is divided into \textit{domain-specific} and \textit{generic} (i.e., domain-universal) intent modules that overlap across domains, promoting cross-domain reusability of annotated examples. \textbf{3)} The dataset design has been inspired by the problems observed in industrial ToD systems, and \textbf{4)} it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, the validity of `intent modularisation', and call for further research on ToD NLU.  ( 2 min )
    Dropout Inference with Non-Uniform Weight Scaling. (arXiv:2204.13047v1 [cs.LG])
    Dropout as regularization has been used extensively to prevent overfitting for training neural networks. During training, units and their connections are randomly dropped, which could be considered as sampling many different submodels from the original model. At test time, weight scaling and Monte Carlo approximation are two widely applied approaches to approximate the outputs. Both approaches work well practically when all submodels are low-bias complex learners. However, in this work, we demonstrate scenarios where some submodels behave closer to high-bias models and a non-uniform weight scaling is a better approximation for inference.  ( 2 min )
    Binding Actions to Objects in World Models. (arXiv:2204.13022v1 [cs.LG])
    We study the problem of binding actions to objects in object-factored world models using action-attention mechanisms. We propose two attention mechanisms for binding actions to objects, soft attention and hard attention, which we evaluate in the context of structured world models for five environments. Our experiments show that hard attention helps contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment. Further, we show that soft attention increases performance of factored world models trained on a robotic manipulation task. The learned action attention weights can be used to interpret the factored world model as the attention focuses on the manipulated object in the environment.  ( 2 min )
    Unsupervised Learning of Unbiased Visual Representations. (arXiv:2204.12941v1 [cs.LG])
    Deep neural networks are known for their inability to learn robust representations when biases exist in the dataset. This results in a poor generalization to unbiased datasets, as the predictions strongly rely on peripheral and confounding factors, which are erroneously learned by the network. Many existing works deal with this issue by either employing an explicit supervision on the bias attributes, or assuming prior knowledge about the bias. In this work we study this problem in a more difficult scenario, in which no explicit annotation about the bias is available, and without any prior knowledge about its nature. We propose a fully unsupervised debiasing framework, consisting of three steps: first, we exploit the natural preference for learning malignant biases, obtaining a bias-capturing model; then, we perform a pseudo-labelling step to obtain bias labels; finally we employ state-of-the-art supervised debiasing techniques to obtain an unbiased model. We also propose a theoretical framework to assess the biasness of a model, and provide a detailed analysis on how biases affect the training of neural networks. We perform experiments on synthetic and real-world datasets, showing that our method achieves state-of-the-art performance in a variety of settings, sometimes even higher than fully supervised debiasing approaches.  ( 2 min )
    Learning to Transfer Role Assignment Across Team Sizes. (arXiv:2204.12937v1 [cs.LG])
    Multi-agent reinforcement learning holds the key for solving complex tasks that demand the coordination of learning agents. However, strong coordination often leads to expensive exploration over the exponentially large state-action space. A powerful approach is to decompose team works into roles, which are ideally assigned to agents with the relevant skills. Training agents to adaptively choose and play emerging roles in a team thus allows the team to scale to complex tasks and quickly adapt to changing environments. These promises, however, have not been fully realised by current role-based multi-agent reinforcement learning methods as they assume either a pre-defined role structure or a fixed team size. We propose a framework to learn role assignment and transfer across team sizes. In particular, we train a role assignment network for small teams by demonstration and transfer the network to larger teams, which continue to learn through interaction with the environment. We demonstrate that re-using the role-based credit assignment structure can foster the learning process of larger reinforcement learning teams to achieve tasks requiring different roles. Our proposal outperforms competing techniques in enriched role-enforcing Prey-Predator games and in new scenarios in the StarCraft II Micro-Management benchmark.  ( 2 min )
    Multi-Objective Physics-Guided Recurrent Neural Networks for Identifying Non-Autonomous Dynamical Systems. (arXiv:2204.12972v1 [eess.SY])
    While trade-offs between modeling effort and model accuracy remain a major concern with system identification, resorting to data-driven methods often leads to a complete disregard for physical plausibility. To address this issue, we propose a physics-guided hybrid approach for modeling non-autonomous systems under control. Starting from a traditional physics-based model, this is extended by a recurrent neural network and trained using a sophisticated multi-objective strategy yielding physically plausible models. While purely data-driven methods fail to produce satisfying results, experiments conducted on real data reveal substantial accuracy improvements by our approach compared to a physics-based model.  ( 2 min )
    Domain Knowledge-Infused Deep Learning for Automated Analog/Radio-Frequency Circuit Parameter Optimization. (arXiv:2204.12948v1 [cs.LG])
    The design automation of analog circuits is a longstanding challenge. This paper presents a reinforcement learning method enhanced by graph learning to automate the analog circuit parameter optimization at the pre-layout stage, i.e., finding device parameters to fulfill desired circuit specifications. Unlike all prior methods, our approach is inspired by human experts who rely on domain knowledge of analog circuit design (e.g., circuit topology and couplings between circuit specifications) to tackle the problem. By originally incorporating such key domain knowledge into policy training with a multimodal network, the method best learns the complex relations between circuit parameters and design targets, enabling optimal decisions in the optimization process. Experimental results on exemplary circuits show it achieves human-level design accuracy (99%) 1.5X efficiency of existing best-performing methods. Our method also shows better generalization ability to unseen specifications and optimality in circuit performance optimization. Moreover, it applies to design radio-frequency circuits on emerging semiconductor technologies, breaking the limitations of prior learning methods in designing conventional analog circuits.  ( 2 min )
    Meshless method stencil evaluation with machine learning. (arXiv:2204.12940v1 [cs.LG])
    Meshless methods are an active and modern branch of numerical analysis with many intriguing benefits. One of the main open research questions related to local meshless methods is how to select the best possible stencil - a collection of neighbouring nodes - to base the calculation on. In this paper, we describe the procedure for generating a labelled stencil dataset and use a variation of pointNet - a deep learning network based on point clouds - to create a classifier for the quality of the stencil. We exploit features of pointNet to implement a model that can be used to classify differently sized stencils and compare it against models dedicated to a single stencil size. The model is particularly good at detecting the best and the worst stencils with a respectable area under the curve (AUC) metric of around 0.90. There is much potential for further improvement and direct application in the meshless domain.  ( 2 min )
    GypSum: Learning Hybrid Representations for Code Summarization. (arXiv:2204.12916v1 [cs.SE])
    Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.  ( 2 min )
    A Bayesian Approach To Graph Partitioning. (arXiv:2204.12927v1 [cs.LG])
    A new algorithm based on bayesian inference for learning local graph conductance based on Gaussian Process(GP) is given that uses advanced MCMC convergence ideas to create a scalable and fast algorithm for convergence to stationary distribution which is provided to learn the bahavior of conductance when traversing the indirected weighted graph. First metric embedding is used to represent the vertices of the graph. Then, uniform induced conductance is calculated for training points. Finally, in the learning step, a gaussian process is used to approximate the uniform induced conductance. MCMC is used to measure uncertainty of estimated hyper-parameters.  ( 2 min )
    Using the Projected Belief Network at High Dimensions. (arXiv:2204.12922v1 [cs.LG])
    The projected belief network (PBN) is a layered generative network (LGN) with tractable likelihood function, and is based on a feed-forward neural network (FFNN). There are two versions of the PBN: stochastic and deterministic (D-PBN), and each has theoretical advantages over other LGNs. However, implementation of the PBN requires an iterative algorithm that includes the inversion of a symmetric matrix of size M X M in each layer, where M is the layer output dimension. This, and the fact that the network must be always dimension-reducing in each layer, can limit the types of problems where the PBN can be applied. In this paper, we describe techniques to avoid or mitigate these restrictions and use the PBN effectively at high dimension. We apply the discriminatively aligned PBN (PBN-DA) to classifying and auto-encoding high-dimensional spectrograms of acoustic events. We also present the discriminatively aligned D-PBN for the first time.  ( 2 min )
    Discovering Quantum Phase Transitions with Fermionic Neural Networks. (arXiv:2202.05183v2 [physics.comp-ph] UPDATED)
    Deep neural networks have been extremely successful as highly accurate wave function ans\"atze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems are in excellent agreement with previous initiator full configuration interaction quantum Monte Carlo and diffusion Monte Carlo calculations. We investigate the spin-polarized homogeneous electron gas and demonstrate that the same neural network architecture is capable of accurately representing both the delocalized Fermi liquid state and the localized Wigner crystal state. The network is given no \emph{a priori} knowledge that a phase transition exists, but converges on the translationally invariant ground state at high density and spontaneously breaks the symmetry to produce the crystalline ground state at low density.  ( 2 min )
    Variance-Reduced Heterogeneous Federated Learning via Stratified Client Selection. (arXiv:2201.05762v2 [cs.LG] UPDATED)
    Client selection strategies are widely adopted to handle the communication-efficient problem in recent studies of Federated Learning (FL). However, due to the large variance of the selected subset's update, prior selection approaches with a limited sampling ratio cannot perform well on convergence and accuracy in heterogeneous FL. To address this problem, in this paper, we propose a novel stratified client selection scheme to reduce the variance for the pursuit of better convergence and higher accuracy. Specifically, to mitigate the impact of heterogeneity, we develop stratification based on clients' local data distribution to derive approximate homogeneous strata for better selection in each stratum. Concentrating on a limited sampling ratio scenario, we next present an optimized sample size allocation scheme by considering the diversity of stratum's variability, with the promise of further variance reduction. Theoretically, we elaborate the explicit relation among different selection schemes with regard to variance, under heterogeneous settings, we demonstrate the effectiveness of our selection scheme. Experimental results confirm that our approach not only allows for better performance relative to state-of-the-art methods but also is compatible with prevalent FL algorithms.  ( 2 min )
    Explainable k-means. Don't be greedy, plant bigger trees!. (arXiv:2111.03193v2 [cs.LG] UPDATED)
    We provide a new bi-criteria $\tilde{O}(\log^2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable $k$-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering $\tilde{O}(k)$ competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into $(1+\delta)k$ clusters (where $\delta\in (0,1)$ is a parameter of the algorithm). The cost of this clustering is at most $\tilde{O}(1/ \delta \cdot \log^2 k)$ times the cost of the optimal unconstrained $k$-means clustering. We show that this bound is almost optimal.  ( 2 min )
    Certified Robustness via Randomized Smoothing over Multiplicative Parameters. (arXiv:2106.14432v2 [cs.LG] UPDATED)
    Currently the most popular method of providing robustness certificates is randomized smoothing where an input is smoothed via some probability distribution. We propose a novel approach to randomized smoothing over multiplicative parameters. Using this method we construct certifiably robust classifiers with respect to a gamma correction perturbation and compare the result with classifiers obtained via other smoothing distributions (Gaussian, Laplace, uniform). The experiments show that asymmetrical Rayleigh distribution allows to obtain better certificates for some values of perturbation parameters. To the best of our knowledge it is the first work concerning certified robustness against the multiplicative gamma correction transformation and the first to study effects of asymmetrical distributions in randomized smoothing.  ( 2 min )
    Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization. (arXiv:2106.11180v2 [math.OC] UPDATED)
    Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies generalization bounds that depend solely on the true loss function. To the best of our knowledge, it is the first generalization bound in the literature that is entirely independent of any other candidates in the hypothesis class. We hope our findings can open the door for a better understanding of DRO, especially its benefits on loss minimization and other machine learning applications.  ( 2 min )
    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Online Deep Learning from Doubly-Streaming Data. (arXiv:2204.11793v2 [cs.LG] UPDATED)
    This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.  ( 2 min )
    TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs. (arXiv:2204.13048v1 [q-bio.BM])
    Computational protein design has the potential to deliver novel molecular structures, binders, and catalysts for myriad applications. Recent neural graph-based models that use backbone coordinate-derived features show exceptional performance on native sequence recovery tasks and are promising frameworks for design. A statistical framework for modeling protein sequence landscapes using Tertiary Motifs (TERMs), compact units of recurring structure in proteins, has also demonstrated good performance on protein design tasks. In this work, we investigate the use of TERM-derived data as features in neural protein design frameworks. Our graph-based architecture, TERMinator, incorporates TERM-based and coordinate-based information and outputs a Potts model over sequence space. TERMinator outperforms state-of-the-art models on native sequence recovery tasks, suggesting that utilizing TERM-based and coordinate-based features together is beneficial for protein design.  ( 2 min )
    An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context. (arXiv:2204.12781v1 [cs.SE])
    As use of data driven technologies spreads, software engineers are more often faced with the task of solving a business problem using data-driven methods such as machine learning (ML) algorithms. Deployment of ML within large software systems brings new challenges that are not addressed by standard engineering practices and as a result businesses observe high rate of ML deployment project failures. Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing such challenges. However, there is a lack of clarity about how DOA systems should be implemented in practice. This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications. We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects. We use Service Oriented Architecture (SOA) as a baseline for comparison. Evaluation is done with respect to different application domains, ML deployment stages, and code quality metrics. Results reveal that FBP is a suitable paradigm for data collection and data science tasks, and is able to simplify data collection and discovery when compared with SOA. We discuss the advantages of FBP as well as the gaps that need to be addressed to increase FBP adoption as a standard design paradigm for DOA.  ( 2 min )
    MAPLE-Edge: A Runtime Latency Predictor for Edge Devices. (arXiv:2204.12950v1 [cs.LG])
    Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating latency on unseen hardware devices with just a few samples, little focus has been given to the challenges of estimating latency on runtimes using optimized graphs, such as TensorRT and specifically for edge devices. In this work, we propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware, where we train a regression network on architecture-latency pairs in conjunction with a hardware-runtime descriptor to effectively estimate latency on a diverse pool of edge devices. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters that are widely available on all Linux kernels, while still achieving up to +49.6% accuracy gains against previous state-of-the-art baseline methods on optimized edge device runtimes, using just 10 measurements from an unseen target device. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes by applying a trick of normalizing performance counters by the operator latency, in the measured hardware-runtime descriptor. Lastly, we show that for runtimes exhibiting lower than desired accuracy, performance can be boosted by collecting additional samples from the target device, with an extra 90 samples translating to gains of nearly +40%.  ( 2 min )
    Uncertainty-Aware Prediction of Battery Energy Consumption for Hybrid Electric Vehicles. (arXiv:2204.12825v1 [cs.LG])
    The usability of vehicles is highly dependent on their energy consumption. In particular, one of the main factors hindering the mass adoption of electric (EV), hybrid (HEV), and plug-in hybrid (PHEV) vehicles is range anxiety, which occurs when a driver is uncertain about the availability of energy for a given trip. To tackle this problem, we propose a machine learning approach for modeling the battery energy consumption. By reducing predictive uncertainty, this method can help increase trust in the vehicle's performance and thus boost its usability. Most related work focuses on physical and/or chemical models of the battery that affect the energy consumption. We propose a data-driven approach which relies on real-world datasets including battery related attributes. Our approach showed an improvement in terms of predictive uncertainty as well as in accuracy compared to traditional methods.  ( 2 min )
    Adaptable Text Matching via Meta-Weight Regulator. (arXiv:2204.12668v1 [cs.IR])
    Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.  ( 2 min )
    Topological Data Analysis for Anomaly Detection in Host-Based Logs. (arXiv:2204.12919v1 [cs.LG])
    Topological Data Analysis (TDA) gives practioners the ability to analyse the global structure of cybersecurity data. We use TDA for anomaly detection in host-based logs collected with the open-source Logging Made Easy (LME) project. We present an approach that builds a filtration of simplicial complexes directly from Windows logs, enabling analysis of their intrinsic structure using topological tools. We compare the efficacy of persistent homology and the spectrum of graph and hypergraph Laplacians as feature vectors against a standard log embedding that counts events, and find that topological and spectral embeddings of computer logs contain discriminative information for classifying anomalous logs that is complementary to standard embeddings. We end by discussing the potential for our methods to be used as part of an explainable framework for anomaly detection.  ( 2 min )
    FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming. (arXiv:2204.13103v1 [cs.DC])
    Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on the acceleration of specific classes of GNNs, such as Graph Convolutional Network (GCN), but lacks the generality to support a wide range of existing or new GNN models. Meanwhile, most work rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which flexibly supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 51-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.03x and 1.25x across two datasets. Our implementation code and on-board measurement are publicly available on GitHub.  ( 2 min )
    Trainable Compound Activation Functions for Machine Learning. (arXiv:2204.12920v1 [cs.LG])
    Activation functions (AF) are necessary components of neural networks that allow approximation of functions, but AFs in current use are usually simple monotonically increasing functions. In this paper, we propose trainable compound AF (TCA) composed of a sum of shifted and scaled simple AFs. TCAs increase the effectiveness of networks with fewer parameters compared to added layers. TCAs have a special interpretation in generative networks because they effectively estimate the marginal distributions of each dimension of the data using a mixture distribution, reducing modality and making linear dimension reduction more effective. When used in restricted Boltzmann machines (RBMs), they result in a novel type of RBM with mixture-based stochastic units. Improved performance is demonstrated in experiments using RBMs, deep belief networks (DBN), projected belief networks (PBN), and variational auto-encoders (VAE).  ( 2 min )
    Spending Privacy Budget Fairly and Wisely. (arXiv:2204.12903v1 [cs.LG])
    Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )
    Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image. (arXiv:2204.13006v1 [cs.CV])
    Head pose estimation is a crucial challenge for many real-world applications, such as attention and human behavior analysis. This paper aims to estimate head pose from a single image by applying notions of network curvature. In the real world, many complex networks have groups of nodes that are well connected to each other with significant functional roles. Similarly, the interactions of facial landmarks can be represented as complex dynamic systems modeled by weighted graphs. The functionalities of such systems are therefore intrinsically linked to the topology and geometry of the underlying graph. In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach to discovering underlying common structure within a pool of poses. Experiments on the BIWI, AFLW2000 and Pointing'04 datasets show that the ORC_XGB method performs well compared to state-of-the-art methods, both landmark-based and image-only.
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.
    Epicardial Adipose Tissue Segmentation from CT Images with A Semi-3D Neural Network. (arXiv:2204.12904v1 [eess.IV])
    Epicardial adipose tissue is a type of adipose tissue located between the heart wall and a protective layer around the heart called the pericardium. The volume and thickness of epicardial adipose tissue are linked to various cardiovascular diseases. It is shown to be an independent cardiovascular disease risk factor. Fully automatic and reliable measurements of epicardial adipose tissue from CT scans could provide better disease risk assessment and enable the processing of large CT image data sets for a systemic epicardial adipose tissue study. This paper proposes a method for fully automatic semantic segmentation of epicardial adipose tissue from CT images using a deep neural network. The proposed network uses a U-Net-based architecture with slice depth information embedded in the input image to segment a pericardium region of interest, which is used to obtain an epicardial adipose tissue segmentation. Image augmentation is used to increase model robustness. Cross-validation of the proposed method yields a Dice score of 0.86 on the CT scans of 20 patients.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    Improving Feature Generalizability with Multitask Learning in Class Incremental Learning. (arXiv:2204.12915v1 [cs.LG])
    Many deep learning applications, like keyword spotting, require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesize that a more transferable and generalizable feature representation from the base model would be beneficial to incremental learning. In this work, we adopt multitask learning during base model training to improve the feature generalizability. Specifically, instead of training a single model with all the base classes, we decompose the base classes into multiple subsets and regard each of them as a task. These tasks are trained concurrently and a shared feature extractor is obtained for incremental learning. We evaluate our approach on two datasets under various configurations. The results show that our approach enhances the average incremental learning accuracy by up to 5.5%, which enables more reliable and accurate keyword spotting over time. Moreover, the proposed approach can be combined with many existing techniques and provides additional performance gain.  ( 2 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    LiftPool: Lifting-based Graph Pooling for Hierarchical Graph Representation Learning. (arXiv:2204.12881v1 [cs.LG])
    Graph pooling has been increasingly considered for graph neural networks (GNNs) to facilitate hierarchical graph representation learning. Existing graph pooling methods commonly consist of two stages, i.e., selecting the top-ranked nodes and removing the rest nodes to construct a coarsened graph representation. However, local structural information of the removed nodes would be inevitably dropped in these methods, due to the inherent coupling of nodes (location) and their features (signals). In this paper, we propose an enhanced three-stage method via lifting, named LiftPool, to improve hierarchical graph representation by maximally preserving the local structural information in graph pooling. LiftPool introduces an additional stage of graph lifting before graph coarsening to preserve the local information of the removed nodes and decouple the processes of node removing and feature reduction. Specifically, for each node to be removed, its local information is obtained by subtracting the global information aggregated from its neighboring preserved nodes. Subsequently, this local information is aligned and propagated to the preserved nodes to alleviate information loss in graph coarsening. Furthermore, we demonstrate that the proposed LiftPool is localized and permutation-invariant. The proposed graph lifting structure is general to be integrated with existing downsampling-based graph pooling methods. Evaluations on benchmark graph datasets show that LiftPool substantially outperforms the state-of-the-art graph pooling methods in the task of graph classification.  ( 2 min )
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.  ( 2 min )
    Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering. (arXiv:2204.12848v1 [cs.LG])
    Predicitions made by neural networks can be fraudulently altered by so-called poisoning attacks. A special case are backdoor poisoning attacks. We study suitable detection methods and introduce a new method called Heatmap Clustering. There, we apply a $k$-means clustering algorithm on heatmaps produced by the state-of-the-art explainable AI method Layer-wise relevance propagation. The goal is to separate poisoned from un-poisoned data in the dataset. We compare this method with a similar method, called Activation Clustering, which also uses $k$-means clustering but applies it on the activation of certain hidden layers of the neural network as input. We test the performance of both approaches for standard backdoor poisoning attacks, label-consistent poisoning attacks and label-consistent poisoning attacks with reduced amplitude stickers. We show that Heatmap Clustering consistently performs better than Activation Clustering. However, when considering label-consistent poisoning attacks, the latter method also yields good detection performance.  ( 2 min )
    Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study. (arXiv:2204.12844v1 [cs.RO])
    The Reinforcement Learning (RL) paradigm has been an essential tool for automating robotic tasks. Despite the advances in RL, it is still not widely adopted in the industry due to the need for an expensive large amount of robot interaction with its environment. Curriculum Learning (CL) has been proposed to expedite learning. However, most research works have been only evaluated in simulated environments, from video games to robotic toy tasks. This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR). We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks. We compare different curricula designs and sampling approaches for DR. Based on this study, we propose a method that significantly outperforms previous work, which uses DR only (No CL is used), with less than a fifth of the training time (samples). Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot. The learned policies achieved success rates of up to 86\% on real-world complex industrial insertion tasks (with tolerances of $\pm 0.01~mm$) not seen during the training.  ( 2 min )
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.  ( 2 min )
    Learning to Parallelize in a Shared-Memory Environment with Transformers. (arXiv:2204.12835v1 [cs.DC])
    In past years, the world has switched to many-core and multi-core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes to software applications. OpenMP is the most comprehensive API that implements such schemes, characterized by a readable interface. Nevertheless, introducing OpenMP into code is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether. We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP contains over 28,000 code snippets, half of which contain OpenMP directives while the other half do not need parallelization at all with high probability. We use the corpus to train systems to automatically classify code segments in need of parallelization, as well as suggest individual OpenMP clauses. We train several transformer models, named PragFormer, for these tasks, and show that they outperform statistically-trained baselines and automatic S2S parallelization compilers in both classifying the overall need for an OpenMP directive and the introduction of private and reduction clauses. Our source code and database are available at: https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.  ( 2 min )
    Supervised Contrastive CSI Representation Learning for Massive MIMO Positioning. (arXiv:2204.12796v1 [cs.IT])
    Similarity metric is crucial for massive MIMO positioning utilizing channel state information~(CSI). In this letter, we propose a novel massive MIMO CSI similarity learning method via deep convolutional neural network~(DCNN) and contrastive learning. A contrastive loss function is designed considering multiple positive and negative CSI samples drawn from a training dataset. The DCNN encoder is trained using the loss so that positive samples are mapped to points close to the anchor's encoding, while encodings of negative samples are kept away from the anchor's in the representation space. Evaluation results of fingerprint-based positioning on a real-world CSI dataset show that the learned similarity metric improves positioning accuracy significantly compared with other known state-of-the-art methods.  ( 2 min )
    GTNet: A Tree-Based Deep Graph Learning Architecture. (arXiv:2204.12802v1 [cs.LG])
    We propose Graph Tree Networks (GTNets), a deep graph learning architecture with a new general message passing scheme that originates from the tree representation of graphs. In the tree representation, messages propagate upward from the leaf nodes to the root node, and each node preserves its initial information prior to receiving information from its child nodes (neighbors). We formulate a general propagation rule following the nature of message passing in the tree to update a node's feature by aggregating its initial feature and its neighbor nodes' updated features. Two graph representation learning models are proposed within this GTNet architecture - Graph Tree Attention Network (GTAN) and Graph Tree Convolution Network (GTCN), with experimentally demonstrated state-of-the-art performance on several popular benchmark datasets. Unlike the vanilla Graph Attention Network (GAT) and Graph Convolution Network (GCN) which have the "over-smoothing" issue, the proposed GTAN and GTCN models can go deep as demonstrated by comprehensive experiments and rigorous theoretical analysis.  ( 2 min )
    When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support. (arXiv:2204.12810v1 [cs.LG])
    Scientific publications about machine learning in healthcare are often about implementing novel methods and boosting the performance - at least from a computer science perspective. However, beyond such often short-lived improvements, much more needs to be taken into consideration if we want to arrive at a sustainable progress in healthcare. What does it take to actually implement such a system, make it usable for the domain expert, and possibly bring it into practical usage? Targeted at Computer Scientists, this work presents a multidisciplinary view on machine learning in medical decision support systems and covers information technology, medical, as well as ethical aspects. Along with an implemented risk prediction system in nephrology, challenges and lessons learned in a pilot project are presented.  ( 2 min )
    Learning Green's functions associated with parabolic partial differential equations. (arXiv:2204.12789v1 [math.NA])
    Given input-output pairs from a parabolic partial differential equation (PDE) in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated Green's function $G$. Until now, rigorously learning Green's functions associated with parabolic operators has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with the randomized singular value decomposition, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(\Gamma_\epsilon^{-1/2}\epsilon)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(\epsilon^{-\frac{n+2}{2}}\log(1/\epsilon))}$ input-output training pairs, where $\Gamma_\epsilon$ is a measure of the quality of the training dataset for learning $G$, and $\epsilon>0$ is sufficiently small. Along the way, we extend the low-rank theory of Bebendorf and Hackbusch from elliptic PDEs in dimension $1\leq n\leq 3$ to parabolic PDEs in any dimensions, which shows that Green's functions associated with parabolic PDEs admit a low-rank structure on well-separated domains.  ( 2 min )
    SVD Perspectives for Augmenting DeepONet Flexibility and Interpretability. (arXiv:2204.12670v1 [cs.LG])
    Deep operator networks (DeepONets) are powerful architectures for fast and accurate emulation of complex dynamics. As their remarkable generalization capabilities are primarily enabled by their projection-based attribute, we investigate connections with low-rank techniques derived from the singular value decomposition (SVD). We demonstrate that some of the concepts behind proper orthogonal decomposition (POD)-neural networks can improve DeepONet's design and training phases. These ideas lead us to a methodology extension that we name SVD-DeepONet. Moreover, through multiple SVD analyses, we find that DeepONet inherits from its projection-based attribute strong inefficiencies in representing dynamics characterized by symmetries. Inspired by the work on shifted-POD, we develop flexDeepONet, an architecture enhancement that relies on a pre-transformation network for generating a moving reference frame and isolating the rigid components of the dynamics. In this way, the physics can be represented on a latent space free from rotations, translations, and stretches, and an accurate projection can be performed to a low-dimensional basis. In addition to flexibility and interpretability, the proposed perspectives increase DeepONet's generalization capabilities and computational efficiencies. For instance, we show flexDeepONet can accurately surrogate the dynamics of 19 variables in a combustion chemistry application by relying on 95% less trainable parameters than the ones of the vanilla architecture. We argue that DeepONet and SVD-based methods can reciprocally benefit from each other. In particular, the flexibility of the former in leveraging multiple data sources and multifidelity knowledge in the form of both unstructured data and physics-informed constraints has the potential to greatly extend the applicability of methodologies such as POD and PCA.  ( 2 min )
    Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control. (arXiv:2204.12707v1 [math.OC])
    We introduce a new closed-loop architecture for the online solution of approximate optimal control problems in the context of continuous-time systems. Specifically, we introduce the first algorithm that incorporates dynamic momentum in actor-critic structures to control continuous-time dynamic plants with an affine structure in the input. By incorporating dynamic momentum in our algorithm, we are able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques. In addition, by leveraging the existence of past recorded data with sufficiently rich information properties, we dispense with the persistence of excitation condition traditionally imposed on the regressors of the critic and the actor. Given that our continuous-time momentum-based dynamics also incorporate periodic discrete-time resets that emulate restarting techniques used in the machine learning literature, we leverage tools from hybrid dynamical systems theory to establish asymptotic stability properties for the closed-loop system. We illustrate our results with a numerical example.  ( 2 min )
    Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training. (arXiv:2204.12729v1 [cs.CV])
    Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for existing pre-training methods: 1) the learned representation is neutral and not informative for a specific task; 2) multi-task learning-based pre-training sometimes leads to sub-optimal solutions due to inconsistent domains of different tasks. To address the above issues, we propose a novel action recognition pre-training framework, which exploits human-centered prior knowledge that generates more informative representation, and avoids the conflict between multiple tasks by using task-dependent representations. Specifically, we distill knowledge from a human parsing model to enrich the semantic capability of representation. In addition, we combine knowledge distillation with contrastive learning to constitute a task-dependent multi-task framework. We achieve state-of-the-art performance on two popular benchmarks for action recognition task, i.e., UCF101 and HMDB51, verifying the effectiveness of our method.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.  ( 2 min )
    A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising. (arXiv:2204.12736v1 [cs.CV])
    Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input images rotated by different rotation angles. MH makes MHCNN simultaneously utilize features of rotated images to remove noise. We also present a novel multi-path attention mechanism (MPA) to integrate these features effectively. Unlike previous attention mechanisms that handle pixel-level, channel-level, and patch-level features, MPA focuses on features at the image level. Experiments show MHCNN surpasses other state-of-the-art CNN models on additive white Gaussian noise (AWGN) denoising and real-world image denoising. Its peak signal-to-noise ratio (PSNR) results are higher than other networks, such as DnCNN, BRDNet, RIDNet, PAN-Net, and CSANN. It is also demonstrated that the proposed MH with MPA mechanism can be used as a pluggable component.  ( 2 min )
    DraftRec: Personalized Draft Recommendation for Winning in Multi-Player Online Battle Arena Games. (arXiv:2204.12750v1 [cs.AI])
    This paper presents a personalized character recommendation system for Multiplayer Online Battle Arena (MOBA) games which are considered as one of the most popular online video game genres around the world. When playing MOBA games, players go through a draft stage, where they alternately select a virtual character to play. When drafting, players select characters by not only considering their character preferences, but also the synergy and competence of their team's character combination. However, the complexity of drafting induces difficulties for beginners to choose the appropriate characters based on the characters of their team while considering their own champion preferences. To alleviate this problem, we propose DraftRec, a novel hierarchical model which recommends characters by considering each player's champion preferences and the interaction between the players. DraftRec consists of two networks: the player network and the match network. The player network captures the individual player's champion preference, and the match network integrates the complex relationship between the players and their respective champions. We train and evaluate our model from a manually collected 280,000 matches of League of Legends and a publicly available 50,000 matches of Dota2. Empirically, our method achieved state-of-the-art performance in character recommendation and match outcome prediction task. Furthermore, a comprehensive user survey confirms that DraftRec provides convincing and satisfying recommendations. Our code and dataset are available at https://github.com/dojeon-ai/DraftRec.  ( 2 min )
    Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning. (arXiv:2204.12703v1 [cs.LG])
    Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity.  ( 2 min )
    Data-based price discrimination: information theoretic limitations and a minimax optimal strategy. (arXiv:2204.12723v1 [cs.GT])
    This paper studies the gap between the classical pricing theory and the data-based pricing theory. We focus on the problem of price discrimination with a continuum of buyer types based on a finite sample of observations. Our first set of results provides sharp lower bounds in the worst-case scenario for the discrepancy between any data-based pricing strategies and the theoretical optimal third-degree price discrimination (3PD) strategy (respectively, uniform pricing strategy) derived from the distribution (where the sample is drawn) ranging over a large class of distributions. Consequently, there is an inevitable gap between revenues based on any data-based pricing strategy and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy. We then propose easy-to-implement data-based 3PD and uniform pricing strategies and show each strategy is minimax optimal in the sense that the gap between their respective revenue and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy matches our worst-case lower bounds up to constant factors (that are independent of the sample size $n$). We show that 3PD strategies are revenue superior to uniform pricing strategies if and only if the sample size $n$ is large enough. In other words, if $n$ is below a threshold, uniform pricing strategies are revenue superior to 3PD strategies. We further provide upper bounds for the gaps between the welfare generated by our minimax optimal 3PD (respectively, uniform pricing) strategy and the welfare based on the theoretical optimal 3PD (respectively, uniform pricing) strategy.  ( 2 min )
    Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems. (arXiv:2204.12665v1 [cs.LG])
    Reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned generalized Q-function can be utilized for zero-shot transfer to related problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects.  ( 2 min )
    Machines of finite depth: towards a formalization of neural networks. (arXiv:2204.12786v1 [cs.LG])
    We provide a unifying framework where artificial neural networks and their architectures can be formally described as particular cases of a general mathematical construction--machines of finite depth. Unlike neural networks, machines have a precise definition, from which several properties follow naturally. Machines of finite depth are modular (they can be combined), efficiently computable and differentiable. The backward pass of a machine is again a machine and can be computed without overhead using the same procedure as the forward pass. We prove this statement theoretically and practically, via a unified implementation that generalizes several classical architectures--dense, convolutional, and recurrent neural networks with a rich shortcut structure--and their respective backpropagation rules.  ( 2 min )
    Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective. (arXiv:2204.12663v1 [cs.LG])
    Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to developing and analyzing new algorithms for various applications. In this work, we provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. Through the lens of multi-rate feedback control, we show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system, possibly with multiple sampling rates, such as decentralized gradient descent, gradient tracking, and federated averaging. This key observation not only allows us to develop a generic framework to analyze the convergence of the entire algorithm class. More importantly, it also leads to an interesting way of designing new distributed algorithms. We develop the theory behind our framework and provide examples to highlight how the framework can be used in practice.  ( 2 min )
    Gaussian Kernel Variance For an Adaptive Learning Method on Signals Over Graphs. (arXiv:2204.12629v1 [eess.SP])
    This paper discusses a special kind of a simple yet possibly powerful algorithm, called single-kernel Gradraker (SKG), which is an adaptive learning method predicting unknown nodal values in a network using known nodal values and the network structure. We aim to find out how to configure the special kind of the model in applying the algorithm. To be more specific, we focus on SKG with a Gaussian kernel and specify how to find a suitable variance for the kernel. To do so, we introduce two variables with which we are able to set up requirements on the variance of the Gaussian kernel to achieve (near-) optimal performance and can better understand how SKG works. Our contribution is that we introduce two variables as analysis tools, illustrate how predictions will be affected under different Gaussian kernels, and provide an algorithm finding a suitable Gaussian kernel for SKG with knowledge about the training network. Simulation results on real datasets are provided.  ( 2 min )
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.  ( 2 min )
    SCGC : Self-Supervised Contrastive Graph Clustering. (arXiv:2204.12656v1 [cs.LG])
    Graph clustering discovers groups or communities within networks. Deep learning methods such as autoencoders (AE) extract effective clustering and downstream representations but cannot incorporate rich structural information. While Graph Neural Networks (GNN) have shown great success in encoding graph structure, typical GNNs based on convolution or attention variants suffer from over-smoothing, noise, heterophily, are computationally expensive and typically require the complete graph being present. Instead, we propose Self-Supervised Contrastive Graph Clustering (SCGC), which imposes graph-structure via contrastive loss signals to learn discriminative node representations and iteratively refined soft cluster labels. We also propose SCGC*, with a more effective, novel, Influence Augmented Contrastive (IAC) loss to fuse richer structural information, and half the original model parameters. SCGC(*) is faster with simple linear units, completely eliminate convolutions and attention of traditional GNNs, yet efficiently incorporates structure. It is impervious to layer depth and robust to over-smoothing, incorrect edges and heterophily. It is scalable by batching, a limitation in many prior GNN models, and trivially parallelizable. We obtain significant improvements over state-of-the-art on a wide range of benchmark graph datasets, including images, sensor data, text, and citation networks efficiently. Specifically, 20% on ARI and 18% on NMI for DBLP; overall 55% reduction in training time and overall, 81% reduction on inference time. Our code is available at : https://github.com/gayanku/SCGC  ( 2 min )
    Evaluation of Self-taught Learning-based Representations for Facial Emotion Recognition. (arXiv:2204.12624v1 [cs.CV])
    This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition (FER). The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data. SVM, Bagging, Random Forest, and a dynamic ensemble selection method are evaluated as final classification methods. Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches that also explore unsupervised feature learning.  ( 2 min )
    Meta-Learning Based Early Fault Detection for Rolling Bearings via Few-Shot Anomaly Detection. (arXiv:2204.12637v1 [cs.LG])
    Early fault detection (EFD) of rolling bearings can recognize slight deviation of the health states and contribute to the stability of mechanical systems. In practice, very limited target bearing data are available to conduct EFD, which makes it hard to adapt to the EFD task of new bearings. To address this problem, many transfer learning based EFD methods utilize historical data to learn transferable domain knowledge and conduct early fault detection on new target bearings. However, most existing methods only consider the distribution drift across different working conditions but ignore the difference between bearings under the same working condition, which is called Unit-to-Unit Variability (UtUV). The setting of EFD with limited target data considering UtUV can be formulated as a Few-shot Anomaly Detection task. Therefore, this paper proposes a novel EFD method based on meta-learning considering UtUV. The proposed method can learn a generic metric based on Relation Network (RN) to measure the similarity between normal data and the new arrival target bearing data. Besides, the proposed method utilizes a health state embedding strategy to decrease false alarms. The performance of proposed method is tested on two bearing datasets. The results show that the proposed method can detect incipient faults earlier than the baselines with lower false alarms.  ( 2 min )
    Novel Applications for VAE-based Anomaly Detection Systems. (arXiv:2204.12577v1 [cs.LG])
    The recent rise in deep learning technologies fueled innovation and boosted scientific research. Their achievements enabled new research directions for deep generative modeling (DGM), an increasingly popular approach that can create novel and unseen data, starting from a given data set. As the technology shows promising applications, many ethical issues also arise. For example, their misuse can enable disinformation campaigns and powerful phishing attempts. Research also indicates different biases affect deep learning models, leading to social issues such as misrepresentation. In this work, we formulate a novel setting to deal with similar problems, showing that a repurposed anomaly detection system effectively generates novel data, avoiding generating specified unwanted data. We propose Variational Auto-encoding Binary Classifiers (V-ABC): a novel model that repurposes and extends the Auto-encoding Binary Classifier (ABC) anomaly detector, using the Variational Auto-encoder (VAE). We survey the limitations of existing approaches and explore many tools to show the model's inner workings in an interpretable way. This proposal has excellent potential for generative applications: models that rely on user-generated data could automatically filter out unwanted content, such as offensive language, obscene images, and misleading information.  ( 2 min )
    Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. (arXiv:2204.12589v1 [cs.LG])
    Physics-informed Neural Networks (PINNs) are gaining attention in the engineering and scientific literature for solving a range of differential equations with applications in weather modeling, healthcare, manufacturing, and so on. Poor scalability is one of the barriers to utilizing PINNs for many real-world problems. To address this, a Self-scalable tanh (Stan) activation function is proposed for the PINNs. The proposed Stan function is smooth, non-saturating, and has a trainable parameter. During training, it can allow easy flow of gradients to compute the required derivatives and also enable systematic scaling of the input-output mapping. It is also shown theoretically that the PINN with the proposed Stan function has no spurious stationary points when using gradient descent algorithms. The proposed Stan is tested on a couple of numerical studies involving general regression problems. It is subsequently used for solving multiple forward problems, which involve second-order derivatives and multiple dimensions, and an inverse problem where the thermal diffusivity is predicted through heat conduction in a rod. Our results of these case studies establish empirically that the Stan activation function can achieve better training and more accurate predictions than the state-of-the-art activation functions.  ( 2 min )
    Rate-Constrained Remote Contextual Bandits. (arXiv:2204.12620v1 [cs.LG])
    We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity, i.e., the decision-maker, that updates the policy to maximize the returned rewards, and communicates the arms to be sampled by the agents to a controller over a rate-limited communications channel. This framework can be applied to personalized ad placement, whenever the content owner observes the website visitors, and hence has the context, but needs to transmit the ads to be shown to a controller that is in charge of placing the marketing content. Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of lossy compression schemes for the policy to be employed whenever the constraint on the channel rate does not allow the uncompressed transmission of the decision-maker's intentions. We characterize the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret that can be achieved, identifying the two distinct rate regions leading to linear and sub-linear regrets respectively. We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric. Based on this, we also propose a practical coding scheme, and provide numerical results.  ( 2 min )
    RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning. (arXiv:2204.12581v1 [cs.LG])
    Offline reinforcement learning (RL) aims to find near-optimal policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. To achieve conservatism, we formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and optimising the model adversarially. The problem formulation that we address is theoretically grounded, resulting in a PAC performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that our approach achieves state of the art performance.  ( 2 min )
    hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification. (arXiv:2204.12587v1 [cs.MM])
    Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, "Troll Meme Classification in DravidianLangTech2022" at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task.  ( 2 min )
    Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction. (arXiv:2204.12586v1 [q-bio.BM])
    Theoretically, the accuracy of computational models in predicting compound-protein binding affinities (CPAs) could be improved by the introduction of protein 3D structure information. However, most of these models still suffer from a low accuracy due to the lack of an efficient approach to encode informative protein features. The major challenge is how to combine the multi-modal information such as the residue sequence of the protein, residue atom coordinates and the torsion angles. To tackle this problem, we develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting CPAs. Specifically, we established a novel end-to-end architecture to jointly embed torsion matrix, discrete distance matrix, and sequence information of protein and extract compound features with deep graph convolution layers. In addition, a new pairwise mapping attention mechanism is introduced to comprehensively learn potential interaction information between proteins and compounds. FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%. Thus, FeatNN provides an outstanding method for highly accurate CPA prediction and facilitates high-throughput virtual screening of drug candidates.  ( 2 min )
    Learning Eco-Driving Strategies at Signalized Intersections. (arXiv:2204.12561v1 [eess.SY])
    Signalized intersections in arterial roads result in persistent vehicle idling and excess accelerations, contributing to fuel consumption and CO2 emissions. There has thus been a line of work studying eco-driving control strategies to reduce fuel consumption and emission levels at intersections. However, methods to devise effective control strategies across a variety of traffic settings remain elusive. In this paper, we propose a reinforcement learning (RL) approach to learn effective eco-driving control strategies. We analyze the potential impact of a learned strategy on fuel consumption, CO2 emission, and travel time and compare with naturalistic driving and model-based baselines. We further demonstrate the generalizability of the learned policies under mixed traffic scenarios. Simulation results indicate that scenarios with 100% penetration of connected autonomous vehicles (CAV) may yield as high as 18% reduction in fuel consumption and 25% reduction in CO2 emission levels while even improving travel speed by 20%. Furthermore, results indicate that even 25% CAV penetration can bring at least 50% of the total fuel and emission reduction benefits.  ( 2 min )
    SoFaiR: Single Shot Fair Representation Learning. (arXiv:2204.12556v1 [cs.LG])
    To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate that fairness-information trade-offs are fully characterized by rate-distortion trade-offs. Then, we use this key result and propose SoFaiR, a single shot fair representation learning method that generates with one trained model many points on the fairness-information plane. Besides its computational saving, our single-shot approach is, to the extent of our knowledge, the first fair representation learning method that explains what information is affected by changes in the fairness / distortion properties of the representation. Empirically, we find on three datasets that SoFaiR achieves similar fairness-information trade-offs as its multi-shot counterparts.  ( 2 min )
    Surrogate Assisted Evolutionary Multi-objective Optimisation applied to a Pressure Swing Adsorption system. (arXiv:2204.12585v1 [cs.NE])
    Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation. This paper extends recent research into optimising chemical plant design and operation. The study further explores Surrogate Assisted Genetic Algorithms (SA-GA) in more complex variants of the original plant design and optimisation problems, such as the inclusion of parallel and feedback components. The novel extension to the original algorithm proposed in this study, Surrogate Assisted NSGA-\Romannum{2} (SA-NSGA), was tested on a popular literature case, the Pressure Swing Adsorption (PSA) system. We further provide extensive experimentation, comparing various meta-heuristic optimisation techniques and numerous machine learning models as surrogates. The results for both sets of systems illustrate the benefits of using Genetic Algorithms as an optimisation framework for complex chemical plant system design and optimisation for both single and multi-objective scenarios. We confirm that Random Forest surrogate assisted Evolutionary Algorithms can be scaled to increasingly complex chemical systems with parallel and feedback components. We further find that combining a Genetic Algorithm framework with Machine Learning Surrogate models as a substitute for long-running simulation models yields significant computational efficiency improvements, 1.7 - 1.84 times speedup for the increased complexity examples and a 2.7 times speedup for the Pressure Swing Adsorption system.  ( 2 min )
    Toward Policy Explanations for Multi-Agent Reinforcement Learning. (arXiv:2204.12568v1 [cs.AI])
    Advances in multi-agent reinforcement learning(MARL) enable sequential decision making for a range of exciting multi-agent applications such as cooperative AI and autonomous driving. Explaining agent decisions are crucial for improving system transparency, increasing user satisfaction, and facilitating human-agent collaboration. However, existing works on explainable reinforcement learning mostly focus on the single-agent setting and are not suitable for addressing challenges posed by multi-agent environments. We present novel methods to generate two types of policy explanations for MARL: (i) policy summarization about the agent cooperation and task sequence, and (ii) language explanations to answer queries about agent behavior. Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.  ( 2 min )
    Process Knowledge-infused Learning for Suicidality Assessment on Social Media. (arXiv:2204.12560v1 [cs.AI])
    Improving the performance and natural language explanations of deep learning algorithms is a priority for adoption by humans in the real world. In several domains, such as healthcare, such technology has significant potential to reduce the burden on humans by providing quality assistance at scale. However, current methods rely on the traditional pipeline of predicting labels from data, thus completely ignoring the process and guidelines used to obtain the labels. Furthermore, post hoc explanations on the data to label prediction using explainable AI (XAI) models, while satisfactory to computer scientists, leave much to be desired to the end-users due to lacking explanations of the process in terms of human-understandable concepts. We \textit{introduce}, \textit{formalize}, and \textit{develop} a novel Artificial Intelligence (A) paradigm -- Process Knowledge-infused Learning (PK-iL). PK-iL utilizes a structured process knowledge that explicitly explains the underlying prediction process that makes sense to end-users. The qualitative human evaluation confirms through a annotator agreement of 0.72, that humans are understand explanations for the predictions. PK-iL also performs competitively with the state-of-the-art (SOTA) baselines.  ( 2 min )
    Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages. (arXiv:2204.12543v1 [cs.CL])
    Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models' misclassified posts across various settings. We have made our code and models public for other researchers.  ( 2 min )
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Multi stain graph fusion for multimodal integration in pathology. (arXiv:2204.12541v1 [eess.IV])
    In pathology, tissue samples are assessed using multiple staining techniques to enhance contrast in unique histologic features. In this paper, we introduce a multimodal CNN-GNN based graph fusion approach that leverages complementary information from multiple non-registered histopathology images to predict pathologic scores. We demonstrate this approach in nonalcoholic steatohepatitis (NASH) by predicting CRN fibrosis stage and NAFLD Activity Score (NAS). Primary assessment of NASH typically requires liver biopsy evaluation on two histological stains: Trichrome (TC) and hematoxylin and eosin (H&E). Our multimodal approach learns to extract complementary information from TC and H&E graphs corresponding to each stain while simultaneously learning an optimal policy to combine this information. We report up to 20% improvement in predicting fibrosis stage and NAS component grades over single-stain modeling approaches, measured by computing linearly weighted Cohen's kappa between machine-derived vs. pathologist consensus scores. Broadly, this paper demonstrates the value of leveraging diverse pathology images for improved ML-powered histologic assessment.  ( 2 min )
    Application of WGAN-GP in recommendation and Questioning the relevance of GAN-based approaches. (arXiv:2204.12527v1 [cs.IR])
    Many neural-based recommender systems were proposed in recent years and part of them used Generative Adversarial Networks (GAN) to model user-item interactions. However, the exploration of Wasserstein GAN with Gradient Penalty (WGAN-GP) on recommendation has received relatively less scrutiny. In this paper, we focus on two questions: 1- Can we successfully apply WGAN-GP on recommendation and does this approach give an advantage compared to the best GAN models? 2- Are GAN-based recommender systems relevant? To answer the first question, we propose a recommender system based on WGAN-GP called CFWGAN-GP which is founded on a previous model (CFGAN). We successfully applied our method on real-world datasets on the top-k recommendation task and the empirical results show that it is competitive with state-of-the-art GAN approaches, but we found no evidence of significant advantage of using WGAN-GP instead of the original GAN, at least from the accuracy point of view. As for the second question, we conduct a simple experiment in which we show that a well-tuned conceptually simpler method outperforms GAN-based models by a considerable margin, questioning the use of such models.  ( 2 min )
    Self-Supervised Information Bottleneck for Deep Multi-View Subspace Clustering. (arXiv:2204.12496v1 [cs.LG])
    In this paper, we explore the problem of deep multi-view subspace clustering framework from an information-theoretic point of view. We extend the traditional information bottleneck principle to learn common information among different views in a self-supervised manner, and accordingly establish a new framework called Self-supervised Information Bottleneck based Multi-view Subspace Clustering (SIB-MSC). Inheriting the advantages from information bottleneck, SIB-MSC can learn a latent space for each view to capture common information among the latent representations of different views by removing superfluous information from the view itself while retaining sufficient information for the latent representations of other views. Actually, the latent representation of each view provides a kind of self-supervised signal for training the latent representations of other views. Moreover, SIB-MSC attempts to learn the other latent space for each view to capture the view-specific information by introducing mutual information based regularization terms, so as to further improve the performance of multi-view subspace clustering. To the best of our knowledge, this is the first work to explore information bottleneck for multi-view subspace clustering. Extensive experiments on real-world multi-view data demonstrate that our method achieves superior performance over the related state-of-the-art methods.  ( 2 min )
    Enhancing Privacy against Inversion Attacks in Federated Learning by using Mixing Gradients Strategies. (arXiv:2204.12495v1 [cs.LG])
    Federated learning reduces the risk of information leakage, but remains vulnerable to attacks. We investigate how several neural network design decisions can defend against gradients inversion attacks. We show that overlapping gradients provides numerical resistance to gradient inversion on the highly vulnerable dense layer. Specifically, we propose to leverage batching to maximise mixing of gradients by choosing an appropriate loss function and drawing identical labels. We show that otherwise it is possible to directly recover all vectors in a mini-batch without any numerical optimisation due to the de-mixing nature of the cross entropy loss. To accurately assess data recovery, we introduce an absolute variation distance (AVD) metric for information leakage in images, derived from total variation. In contrast to standard metrics, e.g. Mean Squared Error or Structural Similarity Index, AVD offers a continuous metric for extracting information in noisy images. Finally, our empirical results on information recovery from various inversion attacks and training performance supports our defense strategies. These strategies are also shown to be useful for deep convolutional neural networks such as LeNET for image recognition. We hope that this study will help guide the development of further strategies that achieve a trustful federation policy.  ( 2 min )
    AI-Assisted Authentication: State of the Art, Taxonomy and Future Roadmap. (arXiv:2204.12492v1 [cs.CR])
    Artificial Intelligence (AI) has found its applications in a variety of environments ranging from data science to cybersecurity. AI helps break through the limitations of traditional algorithms and provides more efficient and flexible methods for solving problems. In this paper, we focus on the applications of artificial intelligence in authentication, which is used in a wide range of scenarios including facial recognition to access buildings, keystroke dynamics to unlock smartphones. With the emerging AI-assisted authentication schemes, our comprehensive survey provides an overall understanding on a high level, which paves the way for future research in this area. In contrast to other relevant surveys, our research is the first of its kind to focus on the roles of AI in authentication.  ( 2 min )
    One-shot Federated Learning without Server-side Training. (arXiv:2204.12493v1 [cs.LG])
    Federated Learning (FL) has recently made significant progress as a new machine learning paradigm for privacy protection. Due to the high communication cost of traditional FL, one-shot federated learning is gaining popularity as a way to reduce communication cost between clients and the server. Most of the existing one-shot FL methods are based on Knowledge Distillation; however, distillation based approach requires an extra training phase and depends on publicly available data sets. In this work, we consider a novel and challenging setting: performing a single round of parameter aggregation on the local models without server-side training on a public data set. In this new setting, we propose an effective algorithm for Model Aggregation via Exploring Common Harmonized Optima (MA-Echo), which iteratively updates the parameters of all local models to bring them close to a common low-loss area on the loss surface, without harming performance on their own data sets at the same time. Compared to the existing methods, MA-Echo can work well even in extremely non-identical data distribution settings where the support categories of each local model have no overlapped labels with those of the others. We conduct extensive experiments on two popular image classification data sets to compare the proposed method with existing methods and demonstrate the effectiveness of MA-Echo, which clearly outperforms the state-of-the-arts.  ( 2 min )
  • Open

    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Tensor decomposition for learning Gaussian mixtures from moments. (arXiv:2106.00555v2 [math.AG] UPDATED)
    In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We consider the problem of recovering Gaussian mixture models from datasets. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution. We consider identifiable tensors, which have a unique decomposition, showing that moment tensors built from spherical Gaussian mixtures have this property. We prove that symmetric tensors with interpolation degree strictly less than half their order are identifiable and we present an algorithm, based on simple linear algebra operations, to compute their decomposition. Illustrative experimentations show the impact of the tensor decomposition method for recovering Gaussian mixtures, in comparison with other state-of-the-art approaches.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Knowledge Transfer in Engineering Fleets: Hierarchical Bayesian Modelling for Multi-Task Learning. (arXiv:2204.12404v1 [stat.ML] CROSS LISTED)
    We propose a population-level analysis to address issues of data sparsity when building predictive models of engineering infrastructure. By sharing information between similar assets, hierarchical Bayesian modelling is used to improve the survival analysis of a truck fleet (hazard curves) and power prediction in a wind farm (power curves). In each example, a set of correlated functions are learnt over the asset fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets of assets are allowed to share correlated information at different levels in the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The correlations can be inspected to inform which assets share information for which effect (i.e. parameter).  ( 2 min )
    Compressed sensing of low-rank plus sparse matrices. (arXiv:2007.09457v2 [math.NA] UPDATED)
    Expressing a matrix as the sum of a low-rank matrix plus a sparse matrix is a flexible model capturing global and local features in data popularized as Robust PCA (Candes et al., 2011; Chandrasekaran et al., 2009). Compressed sensing, matrix completion, and their variants (Eldar and Kutyniok, 2012; Foucart and Rauhut, 2013) have established that data satisfying low complexity models can be efficiently measured and recovered from a number of measurements proportional to the model complexity rather than the ambient dimension. This manuscript develops similar guarantees showing that $m\times n$ matrices that can be expressed as the sum of a rank-$r$ matrix and a $s$-sparse matrix can be recovered by computationally tractable methods from $\mathcal{O}(r(m+n-r)+s)\log(mn/s)$ linear measurements. More specifically, we establish that the low-rank plus sparse matrix set is closed provided the incoherence of the low-rank component is upper bounded as $\mu<\sqrt{mn}/(r\sqrt{s})$, and subsequently, the restricted isometry constants for the aforementioned matrices remain bounded independent of problem size provided $p/mn$, $s/p$, and $r(m+n-r)/p$ remain fixed. Additionally, we show that semidefinite programming and two hard threshold gradient descent algorithms, NIHT and NAHT, converge to the measured matrix provided the measurement operator's RIC's are sufficiently small. These results also provably solve convex and non-convex formulation of Robust PCA with the asymptotically optimal fraction of corruptions $\alpha=\mathcal{O}\left(1/(\mu r) \right)$, where $s = \alpha^2 mn$, and improve the previously best known guarantees by not requiring that the fraction of corruptions is spread in every column and row by being upper bounded by $\alpha$. Numerical experiments illustrating these results are shown for synthetic problems, dynamic-foreground/static-background separation, and multispectral imaging.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )

  • Open

    [P] Auto-encoder image dimension error
    Hello, I have a problem regarding my auto-encoder model. I built it so it can be able to give a MSE error abnormally high when presented an image too different from what it knows. I have this little function at the end of my program to predict whether or not the image presented is an anomaly : ` def check_anomaly(img_path): density_threshold = 2500 #Set this value based on the above exercise reconstruction_error_threshold = 0.004 # Set this value based on the above exercise img = Image.open(img_path) img = np.array(img.resize((128,128), Image.ANTIALIAS)) plt.imshow(img) img = img / 255. img = img[np.newaxis, :,:,:] encoded_img = encoder_model.predict([[img]]) encoded_img = [np.reshape(img, (out_vector_shape)) for img in encoded_img] density = kde.score_samples(encoded_img)[0] reconstruc…  ( 1 min )
    What's the actual difference in simple terms between Cloudera Data Science Workbench, DataRobot, and DataBricks? [Discussion]
    I've also noticed Cloudera and DataRobot seem to follow user pricing where DataBricks follows the standard hourly machine rate. submitted by /u/SmarterChild8675309 [link] [comments]
    [Project] Multi-Bounding box identification
    I have a data set of satellite images of aircraft. For the labels of these images I have the bounding boxes for all aircraft in that image. I looking for some advice on where to get started for this. I need to estimate all bounding boxes for these aircraft, any suggestions? submitted by /u/The_Dov [link] [comments]
    [D] NLP: anyone familiar with taxonomy extraction and evaluation?
    Hi all, does anyone have experience with automatic taxonomy/ontology extraction from unlabelled corpus, especially how to evaluate the extracted structures without gold standards? most published papers would invite students/researchers to conduct manual reviews, thus making it very difficult to compare the results. thanks in advance. submitted by /u/CestLucas [link] [comments]
    [D] Precision-Recall curve with best F1 Score of 0.56
    I am evaluating the performance of a model. I get different Precison-Recall curves for different configurations and the best F1 score (0.56) corresponds to the red point. Would you consider this performance acceptable? I know there is a lot of room for improvement but is it okay or is it extremely poor performance? https://preview.redd.it/k5qihccdb5w81.png?width=784&format=png&auto=webp&s=a0f3408f2ff671ee491faa50752b6c8e0cf17e4c Also, if I understood it correctly, each point of the Precision-Recall curve has its own F1 Score, right? Thank you! submitted by /u/SeaResponsibility176 [link] [comments]  ( 1 min )
    [P] surgeon-pytorch – a small library to inspect intermediate layers of pyTorch models
    I've made surgeon https://github.com/archinetai/surgeon-pytorch, a small library to inspect the intermediate output layers of pyTorch models without changing the original implementation. This can be very useful if you are using pre-trained models (e.g. from Huggingface or torch.hub) and want to get embeddings, attention matrices, or simply debug the model without adding additional code – which is often hard to do without changing the implementation. I hope this can be helpful to anyone! submitted by /u/Aglitter [link] [comments]  ( 1 min )
    [D] Thoughts on the AI4 conference?
    Lately, I have been receiving many messages from the organizers of this conference that they have some free passes to pass on to attend it: https://ai4.io/usa/ While the speaker line-up from the industry appears impressive, I have not heard of this conference before. Any thoughts on how legit/good this conference is? submitted by /u/roalddahl14 [link] [comments]  ( 1 min )
    [D] Has anyone attempted to recreate the work describe in the Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry Paper?
    The D3VO paper from the Tech. University of Munich looks quite promising, however that Uni does not seem very keen on publishing code along with their papers. Has already tried to reproduce their work? What kind of results did you see in your version? What hardware did you run it on? How difficult was it reproduce their models in pytorch? submitted by /u/autojazari [link] [comments]  ( 1 min )
    [D] - Are GFlow Nets considered diffusion models?
    I stumbled upon GFlow Net and in my opinion, it looks very similar to diffusion models. There is a touch of RL in GFlow Net but the main idea is very similar to diffusion models. is that right? or am I missing something? submitted by /u/dimem16 [link] [comments]
    [D] How is it checked if models do not just memorize their training examples?
    Hello everyone, this post is about generative models! (i.e. Score-based-generative models, GANs, etc.) on leaderboards like this https://paperswithcode.com/sota/image-generation-on-cifar-10 How do they check if the models do not just memorize the training examples? The FID score would be optimal in case you would just generate training examples again. Best submitted by /u/future_gcp_poweruser [link] [comments]  ( 5 min )
    [D] ICLR 2022 blog post track
    It seems like the list of accepted blog posts has been published for the ICLR 2022 blog post track (https://iclr-blog-track.github.io/). They also invited Karpathy to publish his most recent blog post on the history and future of convnets. What do you think about the blog posts? Any blog posts that you particularly like or that are definitely worth reading? Any blog posts that actually have interesting contributions and/or that you plan to cite? But maybe more importantly: how do you think this will evolve? They seem to have decided to organise the blog post track again for next year's ICLR already. Do you think these kind of publications could have an impact on how we do science or is this rather a nice extra on top of scientific work? There has only been one post in this subreddit on one of the accepted blog posts (since the announcement). I think there are some nice blog posts in there, so I expected to find some discussions here, but it seems like it is either ignored or not worth discussing. Therefore, I thought it would be interesting to (try to) start a discussion. TL;DR: what do you think of the ICLR blog post track and/or the accepted blog posts? submitted by /u/mr_tsjolder [link] [comments]  ( 1 min )
    [D] MLOps tools for automatic fine tuning of deployed machine learning models
    I'm working on a ML model for data extraction out of documents. The model is trained and deployed into production. For fine tuning the model and improving performance I added the possibility for users to correct the extracted data. A concrete example would be: the model labels a word in the document as "company_name" the user corrects it to "street_name". This correction is then used to fine tune the model. Currently the fine tuning is done manually. That is, if the number of corrections exceeds some threshold I take them and start a new training and evaluate the new model before putting it into production. My question would be: is there an MLOps tool that automates this process? Or should I write one myself? I am aware of the tool seldon core that offers A/B testing for comparing the old model with the new one and putting it into production. But unfortunately it does not offer automatic fine tuning. Or that's what I understood from their website. submitted by /u/alzoubi36 [link] [comments]  ( 1 min )
    [P] Create interactive slides for Machine Learning models from Jupyter Notebook
    Would you like to create an interactive presentation for your ML model directly from Jupyter Notebook? I'm working on an open-source project for converting notebooks into interactive documents. Recently, I've added the option to turn notebooks into interactive slides. You can showcase your Machine Learning model as an interactive presentation. During the presentation, the user can change values and recompute the slide! I've created a demo presentation where I use Random Forest to predict Iris species. The screenshot recording of the presentation: https://github.com/pplonski/ml-model-slides/raw/main/media/slides-from-ml-model.gif The presentation is available online (deployed at Heroku) https://ml-model-presentation.herokuapp.com/ What is more, the presentation code is on GitHub https://github.com/pplonski/ml-model-slides (yes, code is presentation! bye-bye PPT) In case anyone is interested, the framework is called Mercury. submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Feature engineering automation?
    Hello, I’m working as a Data Scientist currently and I realized that most of my time is spent on feature engineering. My general practice is that I create aggregations of data (via sql because of the amount of data that needs to be processed) like sum, mean, avg, std, median, q25, q75. I need to do it on a few dozen features. Also I am calculating these aggregations on different time windows: previous week, previous month, previous 3 month. At the end I end up with hundreds of features and I need to select the ones that make any sense, contain relevant information. Currently I am applying pandas profiling, or sweetviz on this huge dataset and trying to analyze it by eyeballing the results. My main challenge is that this process is highly repetitive and manual. I am wondering if there is any tool out there that could help me automate this process and make some parts reusable? I like having a UI especially for visualizing the data. Am I doing something wrong or is there a tool that I’m clearly not aware of? submitted by /u/sgergely [link] [comments]  ( 4 min )
    [D] How to store/surface predictions along with immutable data in a database api?
    Hi, I am faced with something that I thought was simpe but I have been thinking about it so much that I now very confused. Any suggestions are helpful ! Let's assume the scenario that you have a database of cars. For each car you have a brand, model, colour etc. You have an api for this db which you can call, with a car id, and surface the car data. People rely on this api for fast and up to date car information. You have 1 million cars in the db. Assume now there is a requirement to predict a yes/no if the car has a black front bumper (completely made up requirement). This has to be surfaced to the api users along with the car data. You have built a classifier for this, and it takes some time to surface a prediction, e.g. 2 mins. You select what you think is the best operating point for your classifier. When a new car gets added into the db you can now run your predictor and you get back a probability and depending on the operating point a yes/no. What is the best approach here now ? Do you store the result of this prediction in the database (probability, model_version and boolean result) alongside the rest of the car data ? Less flexible as if you decide to tweak the operating point you will need to recalculate the new boolean values for all cars - but you have consistency wrt to what you have in the db and what you return to your users. If there is a mistake you can readily fix it by altering the boolean field. Do you just store the probabilities and decide on the yes/no using the model_version & operating point only when you surface the data to the end user ? More flexible if you decide to tweak the opearting point of the classifier. or would you have a completely different approach ? Would you change anything, if for some cars you have the ground truth data of they have a black bumper or not in the db already ? Thanks for reading ! submitted by /u/42isthenumber_ [link] [comments]  ( 2 min )
  • Open

    [2202.12742] Learning Relative Return Policies With Upside-Down Reinforcement Learning
    submitted by /u/chimp73 [link] [comments]
    What does it mean to centralise the observation in MARL?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Off-policy algorithm with batch of actions
    Hello everyone, first of all sorry for my poor English. I am fairly new to Deep RL, and RL in general. I want to implement custom environment with multiple robots in it. Each robot would be given same task to do, so what I am trying to do is simulate parallelism. My question is: if i have 100 robots in simulation, do i have to instantiate 100 agents (neural networks) to control those robots or single network with batch size of 100 will suffice? Does batch of observations in anyway disturbs agent action (network) output? It would be much more memory efficient with single neural network. So far, I've seen that in process of acting in env agent takes observation of batch size 1 and outputs corresponding action for that observation. Since each observation from the batch is propagated trough the network without calculating gradient it should not be affected by the batch size. If anyone could explain to me if my way of thinking is wrong, and why. :) submitted by /u/Dexter_fixxor [link] [comments]  ( 1 min )
    "NeuPL: Neural Population Learning", Liu et al 2022 (encoding PBT agents into a single multi-policy agent)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    How do Actor-Critic networks reduce the variance compared to other PG method like Reinforce ?
    i understand about why REINFORCE has high variance but how does AC mitigates it ? submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    on-policy vs off-policy
    I'm looking to find the concrete explanation and difference of on-policy and off-policy learning strategy if possible mathematics can alao be explained. submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Open position at ZF Friedrichshafen AG- Algorithmenentwickler AI & ML
    Feel free to apply! Algorithmenentwickler-AI-&-Machine-Learning-Motion-Planning submitted by /u/gab_ma [link] [comments]
    Internships/Thesis in the field of AI @ZF
    We are currently offering Internships & Theses in the Field of Artificial Intelligence. Here are the links to our open positions. Feel free to share the post! ✌🏽 Mandatory Internship Software Development in the field of Artificial Intelligence Pflichtpraktikant Reinforcement Learning Algorithmen Pflichtpraktikum/ Masterarbeit: Reinforcement Learning submitted by /u/gab_ma [link] [comments]  ( 1 min )
    General questions for those of you up to date on the topic.
    What does the future of deep reinforcement learning look like? I feel like people were pretty hyped about it a year or two ago. Is it heavily researched now and expected to be used more in the future? What are some real world tasks that it can help with? I've seen it used a lot for playing games, self driving cars, and manufacturing robotics. Anything else that it can be applied to? Will it likely be used for autonomous robots when they become more common? What are some areas that can be improved upon? What are the most recent advancements and what is a good topic to focus on for future research? Thank you. submitted by /u/johnGettings [link] [comments]  ( 2 min )
  • Open

    When will humorous AIs press our buttons with their jokes? | Psyche Ideas
    submitted by /u/estasfuera [link] [comments]
    Best AI for blog post creation? Or what tools do you use to get an outline faster?
    I've only every used writesonic.com I got premium acct access for free. I love it for when u need to quickly spin an article if I'm on a backlink building spree. Haven't tested against others but overall good, and improvements we being made on it all the time (in active development). submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Showcase your ML model in a python Web GUI
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    John Deere is becoming one of the world's most important AI companies
    submitted by /u/estasfuera [link] [comments]  ( 1 min )
    I keep getting terrible advice from AI assistants. Help me rate the worst and the best responses :)
    I was playing around with several AI models and this sort of stuff happens all the time. https://preview.redd.it/6arq56nih3w81.png?width=1163&format=png&auto=webp&s=5ca68dce977ae671b0b928c2f1ec3cdd8478257f I decided to prepare a compilation and rank the answers. Can you help me with deciding how bad or good are some of the answers by taking a survey here? Edit: I apologize if some of you felt tricked into completing the survey by the original version of the post. It does have about 50 questions but they are mostly just yes/no, good/bad, so it shouldn't take longer than 10 minutes. I intend to write a piece about how AI assistants are doing and prepare a compilation of AI fails. I will share the results :) submitted by /u/KazRainer [link] [comments]  ( 1 min )
    Is there a dataset for personal items?
    Hi! Im looking for a dataset containing images of personal items (Wallet, keys, phone etc), annotated by bounding boxes. Cant seem to find anything, do anyone know of such a dataset? Thanks in advance! submitted by /u/ifinty [link] [comments]  ( 1 min )
    Can I get AI language bots, who've already been trained?
    I'm brand new to programming. I am thinking I need a bot to skim through pdf's and through trial and error, I can take the pdf's and make the bot come up with suggestions to scripts or books.However I need to use an AI which already has training in English grammatical and sentence structure and langue/information flow. not nescceecarily interpretation meaning or philosophy or other higher levels of language proficiency. Say I wanted to write a novel about sailors. I'd let it skim some 100 novels on sailing, and generate inputs. then I'd match it up against a separate set of Ai to fact check each other. or even with form blogs from some sailing subreddits or news feeds. I could do this with any topic. medicine, engineering, biology, drama novels etc. Are there any free/opensource bots who already know the English language and might read books and make suggestions based on guided inputs? submitted by /u/International__ [link] [comments]  ( 3 min )
    Edit Images Using Sketches! NVIDIA EditGAN Explained. Control any feature from quick drafts
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Train Test Split in Way Too Much Depth
    submitted by /u/mgalarny [link] [comments]
    AI can't help blind people like me with everything... But it certainly can help me find my clothes
    submitted by /u/thisisjoshtseng [link] [comments]  ( 1 min )
    guess its still soon to ask bots about anything feeling related
    ​ https://preview.redd.it/jbvwyd7u41w81.png?width=393&format=png&auto=webp&s=6cbd5568a84c4d96c03ca9a99bbed73b1576a1d9 any bots that can answer better? submitted by /u/MalwareLord [link] [comments]
    How do you “get your own AI bot?”
    I hear about people who trained or fed an AI bot to say, write a Biden Speech, or create poetry. I want to create a bot that pulls important numbers from my local community to post to social media — like lowest gas prices in town, weather average, mortgage interest rates and a couple other things. submitted by /u/No-Setting2541 [link] [comments]  ( 1 min )
    Artificial Nightmares: Disney Princess Ella || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Thinking of starting a personal project to filter profanity, wanted to air out my plan for recommendations before starting
    So, I'm not super comfortable with swearing, but there are a few YouTubers I follow that are as vulgar as they are hilarious. My plan, as it stands, is to use python in concert with some sort of generic voice to text software to get a list of timestamps for any F words, and then use that list of timestamps to generate an FFMPEG command to cut the audio at each timestamp and remerge into a new video. Does anyone have any recommendations for audio processing packages or better approaches? I'm keen to use anything with hardware acceleration (because making a script to abuse my shiny new graphics card is half my motivation). P.S. I haven't touched python much since graduating, can't wait to dig my claws back in. Although... if anyone has an audio processing package in C# that would be hecking cool too Have a nice day y'all submitted by /u/The_Real_Slim_Lemon [link] [comments]  ( 1 min )
  • Open

    Google Colab for Machine Learning Projects
    Have you ever wanted an easy-to-configure interactive environment to run your machine learning code that came with access to GPUs for free? Google Colab is the answer you’ve been looking for. It is a convenient and easy to use way to run Jupyter notebooks on the cloud and their free version comes with some limited […] The post Google Colab for Machine Learning Projects appeared first on Machine Learning Mastery.  ( 13 min )
  • Open

    Machine learning, harnessed to extreme computing, aids fusion energy development
    Linking techniques from machine learning with advanced numerical simulations, MIT researchers take an important step in state-of-the-art predictions for fusion plasmas.  ( 6 min )
  • Open

    MoLeR: Creating a path to more efficient drug design
    Drug discovery has come a long way from its roots in serendipity. It is now an increasingly rational process, in which one important phase, called lead optimization, is the stepwise search for promising drug candidate compounds in the lab. In this phase, expert medicinal chemists work to improve “hit” molecules—compounds that demonstrate some promising properties, […] The post MoLeR: Creating a path to more efficient drug design appeared first on Microsoft Research.  ( 6 min )
  • Open

    Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift
    A hundred and forty turbines in the North Sea — and some GPUs in the cloud — pumped wind under the wings of David Standingford and Jamil Appa’s dream. As colleagues at a British aerospace firm, they shared a vision of starting a company to apply their expertise in high performance computing across many industries. Read article > The post Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift appeared first on NVIDIA Blog.  ( 4 min )
    What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains
    Entrepreneur Jason Mars calls conversation our “first technology.” Before humans invented the wheel, crafted a spear or tamed fire, we mastered the superpower of talking to one another. That makes conversation an incredibly important tool. But if you’ve dealt with the automated chatbots deployed by the customer service arms of just about any big organization Read article > The post What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    TinyML, an underrated field of Machine Learning
    TinyML is a groundbreaking technology! Possessing a lot of potential it is sure to grow exponentially in the coming years.  ( 3 min )
  • Open

    Calculating where projective lines intersect
    A couple days ago I wrote about homogeneous coordinates projective planes. I said that the lines y = 5 and y = 6 intersect in a point “at infinity.” In projective geometry any two distinct lines intersect in exactly one point, and you can compute that intersection point the same way, whether the intersection is […] Calculating where projective lines intersect first appeared on John D. Cook.  ( 4 min )

  • Open

    [D] Transformer-Models-from-Scratch
    I recently started learning machine learning, and I have implemented several transformer models for different tasks from scratch in PyTorch in my Github repository: Transformer-Models-from-Scratch The notebooks are self-contained. And I also included a note I wrote on transformers. Hope it's helpful for anyone learning the transformer model! Let me know if you have any comments! submitted by /u/hbchen-one [link] [comments]
    [P] Curated List of Company Blogs about MLops/ Infra
    Hi all. I am starting a github repo to compile a list of company blogs about their MLops/ infra. Please feel free to contribute if you are interested: https://github.com/enochkan/awesome-ml-stack submitted by /u/kanxx030 [link] [comments]
    [P] TorToiSe - a true zero-shot multi-voice TTS engine
    I'd like to show off a TTS system I have been working on for the past year. I've open-sourced all the code and the trained model weights: https://github.com/neonbjb/tortoise-tts This was born out of a desire to reproduce the original DALLE with speech. It is "zero-shot" because you feed the text and examples of a voice to mimic as prompts to an autoregressive LLM. I think the results are fantastic. Here are some samples: https://nonint.com/static/tortoise_v2_examples.html Here is a colab in which you can try out the whole system: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR submitted by /u/neonbjb [link] [comments]  ( 1 min )
    [P] Baseten – Build ML-powered applications
    Hey, we've been building Baseten to be able quickly deploy models, backends and frontends. I'd love to get your feedback. submitted by /u/Available-Cookie2754 [link] [comments]
    [N] Upcoming talk on Data centric approach to AI from experience at Youtube, ScaleAI and Apple
    Hey folks, There’s an upcoming free talk on May 13. This is what I know: Vijay K, Head of Engineering at Scale.ai, and Mike Wu, Stanford PhD in Machine Learning are going to be talking about strategies for taking a data centric approach to AI, and Vijay’s lessons from doing this at Apple, YouTube, and Scale AI. There’s a a renewed focus on the data layer as a foundation for successful ML projects, and Vijay participated in this transformation firsthand. You’ll be able to hear his reflections and learnings, should be super useful! Sign-up link is here, see you there. submitted by /u/sb2nov [link] [comments]  ( 1 min )
    [News] Adept AI Labs Launches
    This seems like a pretty powerhouse team adept.ai/post/introducing-adept submitted by /u/mrpogiface [link] [comments]
    [D] Is it possible to train a very large 6B model learn from a single training input 1 2 3 to infer 6 7 8 (9) ?
    Hi, I was wondering how long it would take or if it would be possible for a model with 6 billion neurons to become able to predict that for 6 7 8 the next number should be 9, given only a single training example 1 2 3. Let's say we use an architecture like the ones used in GPT. Essentially, I am trying to make sense of something here: is the human design component in DL the weak link in the AGI chain? Much like we could not achieve a true AI by manually coding each rule with if's and else, perhaps we cannot achieve true intelligence by manually designing the networks. Can we grow a model organically so it immediately gets this right from a single training example, and continue using the same network to keep on learning from rich input and making new observations. For guidance, the network can use what it already knows, learning outward. A pre-programmed inherent ability to find motifs, for example with STUMPY and Time Series Analysis, allows it to make relationship observations, and is how the agent immediately guesses that a numeric pattern increments by 1 from a single training example. Putting this into DL, perhaps we can achieve something less organic but still good using a model-agnostic architecture with multi-resolution 'tiles' of neurons. Some property of the information would theoretically allow deciding if the tile should be upgraded to a higher resolution (more neurons), and a side buffer keeps track of the connections between these tiles and tries to move the topology forward, adding some small clusters, removing others, attempting to connect them, etc. submitted by /u/o_snake-monster_o_o_ [link] [comments]  ( 2 min )
    [D] Understanding the use of EMA in Diffusion models
    Reading the original diffusion models paper and the improved diffusion model by openAI, I noticed they are using EMA (exponential moving average) to update the parameters of the models. so I started looking at the code openAI published for their version of the diffusion models, and when looking at the code, I see that the model during the training process has its params stored in a variable called "master_params" and then they create a deep copy of the params and call them ema_params. when looking at the "optimize_normal" method, I see that they update the model params using AdamW and gradient descent, and then after that, they update the ema param variable using the EMA equation, so that means the actual model params do a full gradient descent step to the reach minimum of the loss function, and then they do a pseudo step from the original parameters before the optimizer and making them closer to the params after the optimizer. but then looking at the rest of the code, all I see is that they just save a checkpoint of the ema params to the disk but never update the model params using them or anything. so my question is, what is the EMA for if it is not used during training and the model is fully updated using "classical" machine learning optimization with gradient descent? only at inference time do they load the EMA params to generate images, instead of the regular params that were updated using the AdamW? submitted by /u/eyalmazuz [link] [comments]  ( 2 min )
    [P] Benchmarking and profiling Hugging Face training with Graphsignal
    We've recently added Hugging Face support to https://github.com/graphsignal/graphsignal profiler, which I'd like to share in case someone finds it useful in their efforts to optimize speed and compute. More details, code and screenshots in the blog post https://graphsignal.com/blog/benchmarking-and-profiling-hugging-face-training-with-graphsignal/. submitted by /u/l0g1cs [link] [comments]
    [N] MIT/Meta AI released their new SOTA unsupervised sentence embedding model "DiffCSE"
    Researchers from MIT/Meta recently released a new framework for unsupervised sentence embedding. The performance seems to be better than SimCSE, the previous SOTA, by 2.3 absolute points on downstream tasks. The pretrained models are available on Huggingface. GitHub: https://github.com/voidism/DiffCSE arXiv: https://arxiv.org/abs/2204.10298 submitted by /u/virtualenv [link] [comments]  ( 1 min )
    [D] What do you think of the double standard where an AI learning from copyrighted material is “stealing”, but when a human does it that’s just education?
    Some useful optional considerations: Assume copies of the original copyrighted work are owned lawfully. Assume the AI maintains a single active copy to avoid group performance. The implications this has on banning possibly uploading your own consciousness to save your life on copyright grounds. This “stealing” concept people jump to seems to bring up some interesting logical contradictions. What do you think? submitted by /u/sext-scientist [link] [comments]  ( 2 min )
    [News] New Jupyter Notebook competition
    Are you passionate about coding, data science or Earth observation? https://preview.redd.it/wfwk9ifo4vv81.png?width=1920&format=png&auto=webp&s=5f11885bd8efe88986c181b565cc160534634f0b We're looking for bright-minded people from around the world to showcase their skills and develop new Jupyter Notebooks using Copernicus data! Sound interesting? Find out more here: https://www.eumetsat.int/science-blog/new-jupyter-notebook-competition submitted by /u/EUMETSAT [link] [comments]
    [P] K-Prototypes and evaluating model performance and drift
    I have reached a blocking point in a current project. We are using K-Prototypes to segment two populations (one with 100k elements and the other with 1M). In order to evaluate the clustering, during our K-Means stage we used silhouettes which are already implemented in sklearn. For the second stage, this became a problem. Either it was a time issue or a memory issue. So, for the 100k dataset, finding the silhouette profile took around 50 hours using a custom distance metric function. While this is feasible in the project's scope, for the 1M dataset the computation time would be highly impractical. As such, at the moment, we are stuck without a proper evaluation metric for our model. Sure, we can run Davies-Bouldin or other similar metrics, but the silhouette profile gave us much more detailed information. We are also moving the project to databricks. At first, the pyspark clustering evaluator had me hopeful, but it has very limited options regarding distance metrics. This is also an issue because the model is to be deployed in production and should have some metric informing when it needs to be retrained. While this point is still fluid, this is the preferred course of action. Has anyone faced similar issues using K-Prototypes? Or just silhouette profiles with custom distance metrics? submitted by /u/CaptMartelo [link] [comments]  ( 1 min )
    [P] We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=mk832ksa Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 3 min )
    [R][P] Using language models for molecule captioning and text-based molecule generation
    Hi. We recently did some work on using language models for molecule captioning and text-based molecule generation. You can think of it as doing translation between molecules and natural language. Would love to know if you have any feedback 🤗. Arxiv: https://arxiv.org/abs/2204.11817 submitted by /u/SimilarShape9122 [link] [comments]  ( 1 min )
    [D] Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Exploring Neural Networks Visually in the Browser
    submitted by /u/nickb [link] [comments]
    What has priority in the performance?
    I'm really new to neural networks and I was wondering how to speed up the process of selecting the best one when I do the training. What I mean is, among the training/validation/test quotas, the number of hidden layers, the type of activation functions, the size of each layers, how do I iterate to find the optimal combination without having to try every little mix? Is there a way to rank the impact of these 4 components on the mse and iterate one at a time to select each aspect? Thanks in advance submitted by /u/beppegrosso97 [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Yuval Noah Harari: "One of the things many people don't realize about the AI revolution and the automation revolution: They imagine it as some kind of a one-time event ... This is an extremely unlikely scenario, because we are nowhere near the maximum potential of AI." (3-min. clip)
    submitted by /u/frog9913 [link] [comments]  ( 1 min )
    Dreamy Trippy AI generated Video! VQGAN CliP Rife-RealESRGAN upscale...
    submitted by /u/LordPewPew777 [link] [comments]
    AI Dream 33 - Battestar Galactica Nebula Explosion
    submitted by /u/LordPewPew777 [link] [comments]
    US: Cisco and Verizon collaborated on a successful proof of concept demo in Las Vegas 'meet the latency thresholds required for autonomous driving applications – replacing the costly roadside radios previously required to meet those needs.'
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Last Week in AI: AI uses in government surveillance, AI teaches human drivers, actors union opposes AI actors, and more!
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    Resources
    Do you know reliable sites where to learn artificial intelligence? (i'm studying computer engineering) but i already want to study something in advance submitted by /u/oraudev [link] [comments]  ( 1 min )
    The Future of Apps: Intelligence
    https://blog.r2c.io/the-future-of-apps-intelligence/ submitted by /u/R2Consulting [link] [comments]
    We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=da39a3ee Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 1 min )
    Interested in cognitive science? "Joscha Bach Bits" is a new YouTube channel dedicated to the renowned cognitive scientist Joscha Bach...
    As featured on the Lex Fridman podcast, the Singularity weblog podcast and the Future of Life Institute podcast The channel features shorts of Joscha's opinions and perspectives edited from podcasts. You can check out the trailer, which mostly consists of podcast hosts' minds imploding. All channel videos Channel playlists Channel creator: /u/24karate Enjoy 🤖 submitted by /u/tasinet [link] [comments]  ( 1 min )
    Machine learning's abiding weakness is verification
    submitted by /u/koavf [link] [comments]
    Artificial Nightmares: Monsters Inc || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    A.I. Is Mastering Language. Should We Trust What It Says? • OpenAI’s GPT-3 and other neural nets can now write original prose with mind-boggling fluency — a development that could have profound implications for the future.
    submitted by /u/Naurgul [link] [comments]  ( 1 min )
    Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Build and deploy a scalable machine learning system on Kubernetes with Kubeflow on AWS
    In this post, we demonstrate Kubeflow on AWS (an AWS-specific distribution of Kubeflow) and the value it adds over open-source Kubeflow through the integration of highly optimized, cloud-native, enterprise-ready AWS services. Kubeflow is the open-source machine learning (ML) platform dedicated to making deployments of ML workflows on Kubernetes simple, portable and scalable. Kubeflow provides many […]  ( 15 min )
    Create random and stratified samples of data with Amazon SageMaker Data Wrangler
    In this post, we walk you through two sampling techniques in Amazon SageMaker Data Wrangler so you can quickly create processing workflows for your data. We cover both random sampling and stratified sampling techniques to help you sample your data based on your specific requirements. Data Wrangler reduces the time it takes to aggregate and […]  ( 7 min )
    Part 4: How NatWest Group migrated ML models to Amazon SageMaker architectures
    The adoption of AWS cloud technology at NatWest Group means moving our machine learning (ML) workloads to a more robust and scalable solution, while reducing our time-to-live to deliver the best products and services for our customers. In this cloud adoption journey, we selected the Customer Lifetime Value (CLV) model to migrate to AWS. The […]  ( 12 min )
    Part 3: How NatWest Group built auditable, reproducible, and explainable ML models with Amazon SageMaker
    This is the third post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. This post is intended for data scientists, MLOps engineers, and data engineers who are interested in building ML pipeline templates with Amazon SageMaker. […]  ( 8 min )
    Part 2: How NatWest Group built a secure, compliant, self-service MLOps platform using AWS Service Catalog and Amazon SageMaker
    This is the second post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. In this post, we share how the NatWest Group utilized AWS to enable the self-service deployment of their standardized, secure, and compliant MLOps […]  ( 12 min )
    Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform
    This is the first post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS to build a scalable, secure, and sustainable machine learning operations (MLOps) platform. This initial post provides an overview of the AWS and NatWest Group joint team implemented Amazon SageMaker Studio as the standard for […]  ( 11 min )
    Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for machine learning (ML) applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. Today, […]  ( 7 min )
  • Open

    Reward after each action vs reward after taking all actions in an episodic environment of N steps
    Hello, I am working on compression of deep neural networks using reinforcement learning. There is one agent that learns to compress convolutional layers using C actions and another one that compresses dense layers using D actions. If there are two conv layers and 3 dense layers, 5 actions have to be selected in a sequence using both agents in order to fully compress the model. I read the paper AdaDeep and found it really useful for my research, but I don't get why the authors select all actions and they only calculate the reward after completely compressing the network instead of getting the reward after each action. In their place I would select the action, calculate the reward of that action and store it in the replay. By only using immediate reward, the agent should be able to learn which sequence of actions would work the best for the current model. Why assign the same reward to the selected actions for each layer? Is it only because the outcome was due to the combination of actions and they want to speed up training? If my understanding is correct, assigning the immediate reward to each action would yield the same results in the long run. Thanks in advance. submitted by /u/ElvishChampion [link] [comments]  ( 1 min )
    Multi Agent RL: agents act in different frequencies?
    After reading D, Multi's post, I'm wondering is it possible that two different agents to take action in their own action space & their own frequency? submitted by /u/YMXin1999 [link] [comments]
    I’m going to build a game where the goal will be making many deliveries using the shortest route. How might the environment best be represented to the agent?
    The environment makes a lot of sense to me, as a human. It’s a network composed of nodes and edges. Nodes are the points in the network, and edges are the lines connecting them. The entire thing resembles a real street network and uses coordinate points to position itself on a graph. Of the entire map, select nodes will represent gas stations, and other select nodes will represent stop locations for deliveries. The agent will start at a node and map a route to each location, essentially by stringing together an array of connected edges. It’ll also need to travel to gas stations every X miles or it’ll run out of gas. Now, I’ve never done this before so I’m gonna bounce some of my ideas off a wall here Passing the entire thing to an agent and having it render a graph and whatnot to determ…  ( 2 min )
    What is a train step counter?
    In this repository that I'm looking at, there's an input variable whose meaning I don't understand. At line 135: https://github.com/google-research/google-research/blob/722494ce68130a7409bf94500002c79014905d53/social_rl/multiagent_tfagents/multiagent_ppo.py#L135 train_step_counter: An optional counter to increment every time the train op is run. Defaults to the global_step. What is train_step_counter? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    exponential weighted average
    hey guys I guess this can be trivial for the most of you but I can't get around it, how do I prove this is an exponential weighted average. Thanks The equation is labeled (10) in the link https://chowdera.com/2021/12/202112200806190809.html from reinforcement learning: an introduction by Sutton and barto , tracking a non stationary problem in chapter 2 submitted by /u/ma7modbasha [link] [comments]  ( 1 min )
  • Open

    Data quality: What and why is it important?
    With the internet producing quintillions of readily available information per day, you could be forgiven to think that data is losing its value. Apparently, data is one of those weird commodities that go up in value the more they are available, or perhaps we haven’t produced enough to attain the demand-supply equilibrium. Virtually all companies… Read More »Data quality: What and why is it important? The post Data quality: What and why is it important? appeared first on Data Science Central.  ( 3 min )
    2 Ways in Which Automatic Data Labeling Saves Time and Costs
    Data scientists face a problem: machine learning models need to be trained on labeled datasets, but labeling the data is tedious and time-consuming. Enter automatic data labeling, in which most of the preprocessing work is done by a computer.  At first glance, automatic data labeling sounds too good to be true. Of course, more automation… Read More »2 Ways in Which Automatic Data Labeling Saves Time and Costs The post 2 Ways in Which Automatic Data Labeling Saves Time and Costs appeared first on Data Science Central.  ( 4 min )
    DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak
    As the Omicron variant of Covid-19 surged around the globe in 2021, managers who had begun contingency plans for a return to the office quietly shelved them to wait out the next wave. Heading into the summer of 2022, the omicron-delta variant lurks on the horizon, though whether or not this will trigger the massive… Read More »DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak The post DSC Weekly Newsletter 26 April 2022: Why The Case for RTO Remains Weak appeared first on Data Science Central.  ( 10 min )
    4 Successful Integrated Marketing Communications Examples
    Integrated Marketing Communications (IMC) is an effective communication process that is intended to strengthen the relationship between the customer and the company while enhancing the company’s sales. IMC utilizes a combination of traditional and new approaches in marketing. IMC uses the channel that is most effective to reach the customer. This blog will look at… Read More »4 Successful Integrated Marketing Communications Examples The post 4 Successful Integrated Marketing Communications Examples appeared first on Data Science Central.  ( 4 min )
    No, AI won’t replace astronauts – and here’s why
    A new book predicts artificial intelligence will soon replace astronauts. The authors posit that robots are cheaper, more reliable, and better suited to space travel. But with the human desire for exploration, AI is unlikely to replace astronauts fully. AI will close the gap with human capabilities in the next few decades and surpass them… Read More »No, AI won’t replace astronauts – and here’s why The post No, AI won’t replace astronauts – and here’s why appeared first on Data Science Central.  ( 4 min )
    Smart Factory- Building Future with 5G
    The implementation of digital technologies blurs the line between the physical and digital world. It has become clear that there is a strong need for digital transformation to achieve the next level of efficiency, connectivity, and flexibility needed in manufacturing to weather modern-day disruptions, risks, and fluctuating demands. 5G SMART believes that 5G will be… Read More »Smart Factory- Building Future with 5G The post Smart Factory- Building Future with 5G appeared first on Data Science Central.  ( 2 min )
    Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes
    Stakeholder Journey Maps are a fabulous tool to intimately understand what a stakeholder is trying to accomplish (their objectives and intentions) and the steps/actions/decisions that stakeholder needs to make to complete their journey. Stakeholder Journey Maps are commonly used to help designers to create the optimal user interface and nicely segue into UI storyboards and… Read More »Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes The post Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes appeared first on Data Science Central.  ( 5 min )
    An analysis of Digital Twin Applications across industries
    Background Digital Twins are virtual representations of physical objects, and they can be connected with their physical counterparts. Through this connection, Digital Twins contribute to the convergence of the real and the virtual world. While the Digital twin’s concept is focused on the manufacturing industry, the paper “Dimensions of Digital Twin Applications – A Literature… Read More »An analysis of Digital Twin Applications across industries The post An analysis of Digital Twin Applications across industries appeared first on Data Science Central.  ( 6 min )
    Make Sure Your Online Data Science Courses Teach These 6 Core Skills
    Data science is a wide field with many specializations, and an individual can have a great career with a data science degree. However, curriculums vary between schools, and the specific data science classes taught in one school may not be taught in another. There are several core skills in the data science field that recruiters… Read More »Make Sure Your Online Data Science Courses Teach These 6 Core Skills The post Make Sure Your Online Data Science Courses Teach These 6 Core Skills appeared first on Data Science Central.  ( 3 min )
    What Personal Knowledge Graphs Have to Do with Business
    I help lead a working group focused on personal knowledge graphs (PKGs). Lately, it’s functioned as a discussion and demo evaluation group for new technologies and how they might be used in a knowledge graph context.   Different individuals want to annotate different kinds of data. Some do a lot of research. For them, the need is… Read More »What Personal Knowledge Graphs Have to Do with Business The post What Personal Knowledge Graphs Have to Do with Business appeared first on Data Science Central.  ( 4 min )
    Healthcare App Development: Why You Should Opt for React
    The world of healthcare has consistently evolved, yes, but the fact remains it has gone through tremendous change ever since the coronavirus pandemic started, thus driving the need for modern solutions to meet the increasingly varying needs of patients. In this context, mobile apps have proven to be the leading tool that has driven focus… Read More »Healthcare App Development: Why You Should Opt for React The post Healthcare App Development: Why You Should Opt for React appeared first on Data Science Central.  ( 3 min )
    Why Agile Often Fails and What to Do When It Happens
    Agile, Agile 2 and Agility, Part III In the previous articles in this series, we discussed the role that agile digital delivery capabilities plays in your company’s competitiveness and why rapid delivery is so important.  This article will look at the many reasons that Agile adoptions frequently fail to deliver what companies expect and suggest… Read More »Why Agile Often Fails and What to Do When It Happens The post Why Agile Often Fails and What to Do When It Happens appeared first on Data Science Central.  ( 7 min )
    How to Protect Your Computer Data
    When it comes to protecting your computer, you can do a few basic things. First, create separate user accounts for work and personal data. Make sure to back up your data and use a firewall. Also, make sure to encrypt it.You should make sure to back up any important documents or photos you may have… Read More »How to Protect Your Computer Data The post How to Protect Your Computer Data appeared first on Data Science Central.  ( 4 min )
  • Open

    Misconceptions about AI, Robotics, and Machine Learning
    Yes and no. So, if you asked this question, good one! When I was new to this stuff, I had the same question and searched up a lot about it.  ( 3 min )
  • Open

    In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist
    This week In the NVIDIA Studio, we’re launching the April NVIDIA Studio Driver with optimizations for the most popular 3D apps, including Unreal Engine 5, Cinema4D and Chaos Vantage. The driver also supports new NVIDIA Omniverse Connectors from Blender and Redshift. The post In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    A smarter way to develop new drugs
    A new artificial intelligence technique only proposes candidate molecules that can actually be produced in a lab.  ( 6 min )
  • Open

    Random Blaschke products and Mathematica binding
    A Blaschke product is a function that is the product of Blaschke factors, functions of the form b(z; a) = |a|  (a – z) / a (1 – a*z) where the complex number a lies inside the unit circle and a* is the complex conjugate of a. I wanted to plot Blaschke products with random […] Random Blaschke products and Mathematica binding first appeared on John D. Cook.  ( 2 min )

  • Open

    [D] Hypothetically, what's the value in being able to label ~500 million images a day?
    I'm not going to make this vague. Specifically I'm seeing a lot of comments related to Elon Musk saying he's going to remove bots from Twitter. There's a lot of speculation on how this could be done with comments suggesting a captcha-based system for every post/reply (or maybe every action?). More specifically people seem fixated on captcha systems that can't be botted. (Ignore for a moment the accessibility issues and audio fallbacks that might be required). I'm aware that Tesla transitioned to using more massive synthetic datasets for training, so this might be somewhat outdated. That said they do have a lot of data collection of new real world data from their sensors. This has me curious on the estimated value of large-scale captcha systems directly tied with a company that might need large-scale labeling services. I'm sure researchers here have run numbers on labeling services and how to best to utilize them. (Goes without saying the prices vary quite a bit and cover a wide range of tasks from simple bounding boxes to relatively expensive polygons for semantic labeling. For example just as a reference: https://cloud.google.com/ai-platform/data-labeling/pricing ). Most services don't list prices for like 500 million tasks which makes sense given that's a lot. A captcha system would have independent users redoing tasks multiple times to find baselines, but in general it would average to find a ground truth label. This redundant work isn't wasted in this sense as it can find say difficult scenarios and refine labels. I could naively say 500 million / (10 USD/1000 tasks) = 5 million USD a day not counting server costs and development. Not super scientific though and it seems high. (I have doubts on if gathering 500 million samples of value a day to even get labeled is realistic). I digress, what would you say is the hypothetical value of such a system using short captcha tasks? submitted by /u/Sirisian [link] [comments]  ( 2 min )
    [N] Modular Reasoning, Knowledge, & Language (MRKL) Hybrid System For More 'General' NLP
    AI21 Labs’ Modular Reasoning, Knowledge and Language (MRKL, pronounced “miracle”) system – and Jurassic-X includes one or more language models, and augment them with external knowledge sources as well as symbolic reasoning experts that can handle tasks that lie beyond the reach of neural models. There are 55 different task-specific modules that MRKL currently supports. If the router is unsure which module is best, it calls on Jurassic-1. Jurassic also helps compose the contextual language around MRKL’s response. This allows MRKL to give factual answers with up-to-date information instead of being limited to its training data alone, and gives it the ability to carry out a much wider range of NLP tasks as compared to other LLM's like Google's PaLM or OpenAI's GPT-3 AI21 blog and whitepaper here Video here submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] Opinions on NVIDIA TAO Toolkit?
    I'm working on an Edge ML product where we train models in the cloud and then run them on a device using tensorRT. We're considering switching to using the Nvidia TAO Toolkit for training. If you've used TAO, do you like it? Is it limiting? Our alternative is training in pytorch and then converting to ONNX and then tensorRT separately. Thanks! submitted by /u/linguistBot [link] [comments]  ( 1 min )
    [R][P] An arxiv-sanity-like view of ICLR 2022 papers
    submitted by /u/tanelai [link] [comments]  ( 1 min )
    [D] Paper Explained - ACCEL: Evolving Curricula with Regret-Based Environment Design (Video Walkthrough)
    https://youtu.be/povBDxUn1VQ Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods. ​ OUTLINE: 0:00 - Intro & Demonstration 3:50 - Paper overview 5:20 - The ACCEL algorithm 15:25 - Looking at the pseudocode 23:10 - Approximating regret 33:45 - Experimental results 40:00 - Discussion & Comments ​ Website: https://accelagent.github.io Paper: https://arxiv.org/abs/2203.01302 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Parameter Efficiency without Computational Efficiency
    Hello hivemind. Context I am working on streamlined design strategies for (manual) neural architecture design. I recently came across a simple, receptive field-based strategy that allows me to reliably improve the architectures of some SOTA models like EfficientNet by up +1.5% top1-accuracy.That being said, there seems to be a trade-off between performance and number of computations I put into the same number of parameters.Basically, the more computationally expensive the architectural change is, the better the performance turns out to be. The number of parameters does not change in the process. The model becomes therefore more parameter efficient but computationally less efficient, which you could consider a somewhat Pyrrhic victory. Now to my question: Is there use for parameter efficiency in models when it does not also coincide with computational efficiency? Is there any literature on the topic you can recommend? submitted by /u/KrakenInAJar [link] [comments]  ( 1 min )
    [D] Calculating feature importance: which model/training set to use in the case of cross validation?
    So far I've only seen feature importance calculated on one model/ training set at a time, However, with cross validation there are many models built on different training sets, how do you calculate feature importance in this case? Just do it on one model? Or calculate feature importance for each model and somehow aggregate it (e.g., by averaging)? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 2 min )
    [R] Recent Trends In Diffusion-Based Text-Conditional Image Synthesis
    Hello, I wrote the blog post about text-conditional image generation using diffusion models (including DALLE-2). Let me know what you think! ​ https://sangyun884.github.io/recent-trends-in-diffusion-based-text-conditional/ submitted by /u/Impressive-Mirror430 [link] [comments]
    [N][P] Use GitHub Actions for ML with DagsHub Connect
    Hey r/MachineLearning, Nir from DagsHub here, and I'm thrilled to share a project we're launching today that will hopefully unlock GitHub Actions for ML. GitHub Actions solved a DevOps burden many of us felt by providing an easy-to-configure CI/CD tool to build, test, and deploy pipelines. However, when it comes to ML pipelines, and working with data, models, and experimentation in mind, the workflow is not as well defined and can get tricky to implement. DagsHub is kind of like GitHub for machine learning, which extends what GitHub did for code management to track data, models, experiments, and data pipelines. We do this by integrating awesome open source tools like DVC, MLflow, Label Studio, and more. One of our most requested features was a deeper integration with GitHub that will let…  ( 1 min )
    [R] ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    Hi folks, our ICLR 2022 Blog post on "The 37 Implementation Details of Proximal Policy Optimization" is live 😀 Our post makes it easier to understand the nitty-gritty PPO's implementations with 1) 🎥 video tutorials 2) 📜 detailed references and explanations 3) ⌨️ really simple code Here are the links: Official URL: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Twitter thread: https://twitter.com/vwxyzjn/status/1518589115163369472 OpenReview link:https://openreview.net/forum?id=Hl6jCqIp2j GitHub repo: https://github.com/vwxyzjn/ppo-implementation-details YouTube tutorial on PPO: https://www.youtube.com/playlist?list=PLD80i8An1OEHhcxclwq8jOMam0m0M9dQ_ I am the main author & feel free to ask me anything here. submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    [N] Learn how the open-source ecosystem can be used in your machine learning and data science classes.
    Hey, 🤗 Hugging Face is offering a workshop (June 6) for instructors of machine learning and data science who would like to learn how the open-source ecosystem can be used in their classes. After this workshop, you will know how to: 🧑‍💻 Teach Transformers models & famous ML libraries 🤖 Onboard students to the Hub to build/host projects 💾 Publish models/datasets in a few lines of code During the workshop, you will be invited to join the following page for a better understanding of our open-source solutions: https://huggingface.co/teach For more details about the workshop content, visit: https://hf.co/teaching Feel free to register here:) submitted by /u/VioletteLep [link] [comments]  ( 1 min )
    [D] Does anyone know a large varied image dataset that do NOT contain humans?
    It could be pictures of anything, except of humans. But it would be better if it were not focused on a single topic like dog images. submitted by /u/TheManveru [link] [comments]  ( 1 min )
    [R] A new dataset and a library that you can use for ML and RL over the Web
    TL;DR: Download dataset of labelled Web pages, WebTraversalLibrary for scripting web interactions Hi everyone! Our group at Klarna has been putting in a ton of work into deep learning for the Web over the past few years and we've made a couple of useful resources available for the research community. You might find them interesting if you're looking for new ideas for spare-time or even post-grad research projects. We've open-sourced a dataset of about 50k labeled product web pages from roughly 8000 distinct e-commerce merchants, available as MHTML and WebTraversalLibrary clones (see next point :) ), along with the corresponding screenshots. Not all of the MHTMLs render correctly, but the ones that do also have screenshots in a corresponding dataset for CV applications. You can find doc…  ( 2 min )
    [D] Is anyone working on open-sourcing Dall-E 2?
    Just like Eleuther did with GPT3? submitted by /u/invertedpassion [link] [comments]  ( 2 min )
    [P] Demo of Google's new PHORUM Image➠3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    Chatbots
    Does anyone know where chatbots specifically the ones on Chai get their information? This one knew what meta-analysis was and knew who the main character of a show was I do not believe the bots are real people based on both the information I've seen and my experience with them glitching out Plus occam's razor yk Im guessing they can use google or something but it's odd because they clearly don't have access to the time submitted by /u/Shadowfax42- [link] [comments]  ( 1 min )
    Arcane Style Transfer
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    The most basic AI scheme.
    submitted by /u/idvlknv [link] [comments]
    Nvidia AI Designs GPU 3,600x Faster | Breakthrough MRKL NLP Techniques | AI Mind To Predict Dementia
    submitted by /u/getrich_or_diemining [link] [comments]
    Is there a AI that is able to turn normal pictures in that kind of pictures or that I am able to plott them with my pen-plotter?
    submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    What are cool AI tools which I could use to edit images/videos?
    (I have no experience with photoshop) submitted by /u/xXNOdrugsForMEXx [link] [comments]
    Are You The Asshole? New AI Mimics Infamous Advice Subreddit
    submitted by /u/Emmanuel_T_Goldstein [link] [comments]
    Bias in Artificial Intelligence; Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Ai Becomes Sentient 3
    submitted by /u/webauteur [link] [comments]
    To be AI-first, do AI last
    submitted by /u/bendee983 [link] [comments]
    GPT-3 not available in my country
    is there a way to access any of openAI APIs if it's not avalable in my country? submitted by /u/dogaryy [link] [comments]  ( 1 min )
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Reface app deepfake technology
    submitted by /u/Aggravating-Deal-260 [link] [comments]
    Demo of Google's new PHORUM Image→3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    AI Hitman Learns to Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Help with walk forward validation in LSTM problem
    Hi, I am having some trouble with a LSTM problem regarding walk forward validation in my LSTM. The problem is described in this stackoverflow post: https://stackoverflow.com/questions/71990833/using-predictions-instead-of-observed-values-in-walk-forward-validation-in-lstm If any one could help me that would be much appreciated submitted by /u/magnussendjoko [link] [comments]  ( 1 min )
    I don't think this was posted here before but it's incredible. The latest image generation from text
    submitted by /u/gwtkof [link] [comments]
    NN from Scratch: #5 Updating parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
  • Open

    Host Hugging Face transformer models using Amazon SageMaker Serverless Inference
    The last few years have seen rapid growth in the field of natural language processing (NLP) using transformer deep learning architectures. With its Transformers open-source library and machine learning (ML) platform, Hugging Face makes transfer learning and the latest transformer models accessible to the global AI community. This can reduce the time needed for data […]  ( 8 min )
    How Nordic Aviation Capital uses Amazon Rekognition to streamline operations and save up to EUR200,000 annually
    Nordic Aviation Capital (NAC) is the industry’s leading regional aircraft lessor, serving almost 70 airlines in approximately 45 countries worldwide. In 2021, NAC turned to AWS to help it use artificial intelligence (AI) to further improve its leasing operations and reduce its reliance on manual labor. With Amazon Rekognition Custom Labels, NAC built an AI […]  ( 5 min )
  • Open

    Estimating the informativeness of data
    MIT researchers can now estimate how much information data are likely to contain, in a more accurate and scalable way than previous methods.  ( 6 min )
    An easier way to teach robots new skills
    Researchers have developed a technique that enables a robot to learn a new pick-and-place task with only a handful of human demonstrations.  ( 7 min )
  • Open

    SimpleGrid env for OpenAI gym
    SimpleGrid is a simple gridworld environment for OpenAI gym. It is easy to use and customise and it is intended to offer an environment for quick testing and prototyping different RL algorithms. I developed this environment by taking inspiration from the FrozenLake environment and gym-minigrid. Check it out at: https://github.com/damat-le/gym-simplegrid ​ https://i.redd.it/prqd7muujqv81.gif submitted by /u/damat-le [link] [comments]  ( 1 min )
    Hybrid CPU topology impact on training
    Hi, some newer CPU's (Apple M1, Intel Alder Lake) have started using a hybrid CPU topology. I.e. the CPU consists of some P-cores (high performance cores) and E-cores (slower 'eco' cores). I personally do not own such a CPU yet but I'm considering upgrading to one soon, so I am looking for experiences of people with such a CPU on training reinforcement models. Does everything work as expected? Are there annoyances? I'm using Ray Tune + RLlib in which I assign a single core per trial. In this case I'm expecting that trials running on E-cores will simply run (much) slower than those running on P-cores. In case a single trial gets assigned both a P-core and an E-core, I do expect a serious slow-down. These are all guesses really, so I'm looking for people with actual experience here. Thanks. submitted by /u/katsu9 [link] [comments]  ( 1 min )
    Policy Iteration on OpenAI Gym taxi-v3
    Hey everyone, I managed to implement the policy iteration from Sutton & Barto, 2018 on the FrozenLake-v1 and wanted to do the same now Taxi-v3 environment. My code has been running now for 45min so I guess there is something wrong, but I can't wrap my head around what it could be. Would appreciate some input on what I need to change so that it will work. Please see my code here: ```[python] import gym # openAi gym import torch import matplotlib.pyplot as plt from tqdm import trange # progressbar torch.manual_seed(4) env = gym.make('Taxi-v3') def policy_evaluation(env: gym.Env, policy: torch.Tensor, gamma: float, threshold: float): V = torch.zeros(env.observation_space.n) delta = float("inf") while delta >= threshold: V_tmp = torch.empty(env.observation_space.n) for state in range(…  ( 1 min )
    ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    Deep Reinforcement Learning Free Class by Hugging Face 🤗
    Hey there! We're happy to announce the launch of the Hugging Face Deep Reinforcement Learning class! 🤗 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 In this free course, you will: 📖 Study Deep Reinforcement Learning in theory and practice. 🧑‍💻 Learn to use famous Deep RL libraries such as Stable Baselines3, RL Baselines3 Zoo, and RLlib. 🤖 Train agents in unique environments with SnowballFight, Huggy the Doggo 🐶, and classical ones such as Space Invaders and PyBullet. 💾 Publish your trained agents in one line of code to the Hub. But also download powerful agents from the community. 🏆 Participate in challenges where you will evaluate your agents against other teams. 🖌️🎨 Learn to share your environments made with Unity and Godot. 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 📚 The syllabus: https://github.com/huggingface/deep-rl-class https://preview.redd.it/b409a9sscov81.jpg?width=1920&format=pjpg&auto=webp&s=ebdfa10c220b5a3dec17894bc0f955ed9d8f7634 If you have questions and feedback, I would love to answer them, Thanks, submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    Has anyone solved deepmind control locomotion in state-based?
    Hi, I'm wondering if I could train the state-based locomotion well from the scratch. In my case, I trained SAC agent and It showed divergence of the q function. I think the cause is the unscaled vector input. Before the trouble shooting, I want to ask someone trained agent in this environment. Thank you for reading. https://github.com/deepmind/dm_control/tree/main/dm_control/locomotion submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Can't solve OpenAI problems
    I started off with OpenAI's mountain car, and I don't know where to even start. I got the setup working with the environment, but now have no idea how to train it. How do I learn to code for RL? Every tutorial I have seen so far has implemented Q Learning algorithms completely with little explanation. I looked at the solved code and it doesn't make sense to me. How should I prepare before I go into OpenAI? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
  • Open

    Google at ICLR 2022
    Posted by Cat Armato and Callan Hajosy, Program Managers The 10th International Conference on Learning Representations (ICLR 2022) kicks off this week, bringing together researchers, entrepreneurs, engineers and students alike to discuss and explore the rapidly advancing field of deep learning. Entirely virtual this year, ICLR 2022 offers conference and workshop tracks that present some of the latest research in deep learning and its applications to areas ranging from computer vision, speech recognition and text understanding to robotics, computational biology, and more. As a Platinum Sponsor of ICLR 2022 and Champion DEI Action Fund contributor, Google will have a robust presence with nearly 100 accepted publications and extensive participation on organizing committees and in workshops…  ( 12 min )
  • Open

    Projective duality
    The previous post explained how to define a projective plane over a field F. Now let’s look at how we do geometry in a projective plane. Definitions We have a definition of points from the other post: a point is a triple (a, b, c) of elements of F, with not all elements equal to […] Projective duality first appeared on John D. Cook.  ( 3 min )
    Finite projective planes
    Given a field F, finite or infinite, you can construct a projective plane over F by starting with pairs of elements of F and adding “points at infinity,” one point for each direction. Motivation: Bézout’s theorem A few days ago I mentioned Bézout’s theorem as an example of a simple theorem that rests on complex […] Finite projective planes first appeared on John D. Cook.  ( 4 min )
  • Open

    PPE: A fast and provably efficient RL algorithm for exogenous noise
    Picture a person walking in a park by a pond. The surrounding environment contains a number of moving objects that change the quality of the environment: clouds moving to hide the sun, altering the quality of light; ducks gliding across the pond, causing its surface to ripple; people walking along a path, their images reflecting […] The post PPE: A fast and provably efficient RL algorithm for exogenous noise appeared first on Microsoft Research.  ( 9 min )
  • Open

    Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop
    When brainstorming a scene to best showcase the groundbreaking capabilities of the Omniverse platform, some NVIDIA artists turned to a cherished memory: enjoying ramen together in a mom-and-pop shop down a side street in Tokyo. Simmering pots of noodles, steaming dumplings, buzzing kitchen appliances, warm ambient lighting and glistening black ledger stools. These were all Read article > The post Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop appeared first on NVIDIA Blog.  ( 4 min )
    Stellar Weather: Researchers Describe the Skies of Exoplanets
    A paper released today describes in the greatest detail to date the atmospheres on distant planets. Seeking the origins of what’s in and beyond the Milky Way, researchers surveyed 25 exoplanets, bodies that orbit stars far beyond our solar system. Specifically, they studied hot Jupiters, the largest and thus easiest to detect exoplanets, many sweltering Read article > The post Stellar Weather: Researchers Describe the Skies of Exoplanets appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Let’s Talk about Machine Translation: The powering engine behind “Google Translate”
    At some point, we’ve all used Google translate, Microsoft,DeepL or Bing translator to impress our friends/colleagues who speak a different…  ( 4 min )
  • Open

    Multiprocessing in Python
    When you work on a computer vision project, you probably need to preprocess a lot of image data. This is time-consuming, and it would be great if you could process multiple images in parallel. Multiprocessing is the ability of a system to run multiple processors at one time. If you had a computer with a […] The post Multiprocessing in Python appeared first on Machine Learning Mastery.  ( 9 min )
  • Open

    Should I Use Offline RL or Imitation Learning?
    Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally “goo…  ( 11 min )

  • Open

    [D] Legality of Hosting ImageNet
    Despite it's immense popularity in academia, it's surprisingly difficult to download the ImageNet Object Localization dataset. As far as I can tell this is due to legal issues -- no single entity owns the copyright to the images, so no entity can host the whole dataset. The result is that if you want to use ImageNet you're forced to either manually scrape a million URLs (requiring both cpu time, your time, and imposing costs on a million unsuspecting websites), know somebody who has already done that, or fetch it from a legally questionable source. So I have a couple questions: Is the owner of the ImageNet dataset on Kaggle performing a selfless public service, thanklessly accepting legal liability to make ImageNet more accessible? Or is she protected (e.g. by fair use)? Is she required to accept DMCA requests? If I'd like to share ImageNet (I've recently downloaded it and processed it to be ~10 GB, which seems like a helpful thing to share), is there any legally safe path for me to do this? submitted by /u/you-get-an-upvote [link] [comments]  ( 3 min )
    [D] Machine Learning - WAYR (What Are You Reading) - Week 136
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 134 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 135 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/CatalyzeX_code_bot: Paper link /u/lauren_v2: paper Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]  ( 1 min )
    [P] Showcase your Machine Learning Research/Projects in Hugging Face Spaces using Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [D] How is NVIDIA P100 on Google Colab Pro compared to Laptop with RTX3080 (Mobile, or Max-Q) ?
    submitted by /u/aviisu [link] [comments]  ( 6 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [D] What is the best optimizer to use when visualization inter-net neurons by optimizing random input in relation to it?
    Hi, seemingly it's become a staple in conv net inner-working visualization to put a network in eval mode, sample a random noise image, and optimize the image in relation to the activation of some internal neuron. From what I saw, most examples of this on the internet are using an Adam optimizer for this with a learning rate of 0.1 and a weight decay of 1e-6. This doesn't seem quite right with me, So if any of you know what's the source for this convention and if there are other alternatives I'd appreciate this information very much. Thanks! submitted by /u/ondrea_luciduma [link] [comments]  ( 1 min )
  • Open

    Projects to jump into RL?
    I have some experience with ML but not at all with RL. I know basic theory and want to just get started with the programming part. I saw OpenAI's gym, but I want to learn RL that can be applied anywhere. I don't want to be specifically constrained to OpenAI's gym and want something where I can apply it to game development, such as Unity. Are there any good resources to just do an RL project? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
    N Step Prioritized Replay Buffer
    I have a few questions about implementing the N Step version of Prioritized Replay Buffer (for Rainbow DQN). I'm implementing the Atari version of this buffer. To conserve memory, I'm only storing the states (and the last state of each episode) in an unstacked manner. That is, if the frame stack is 4, and the shape of states returned by the environment is 4, I'm storing only the last state of the stacked states. This way the buffer contains all transitions from each episode. As for the N Steps part, I'm only calculating the n_step states when getting states from the buffer instead of storing the N Step transitions directly. For the prioritized version, how do the priorities work? If I wasn't trying to conserve memory, I would have stored the N Step stacked states directly and update the priorities for the segment trees and the segment tree pointers only when moving the data from the N Step buffer to the main buffer. But now that I'm calculating the N Step experience directly when sampling and not when adding data to the buffer, when do I update the priorities and the tree pointer? Once per state? Once per stacked state? Once per N Step stacked state? When I update the priorities for the sampled batch, the priorities are associated with multiple states (because the states are stacked). But because I don't store the stacked states and only the raw unstacked states, to which of these states should I update the priority for? And because I don't store the n step transitions anymore, to which of the N Steps should the priorities be associated with? If I want to create such a buffer for a vector env, how would I go about doing it? I'm thinking of maintaining a separate segment tree for each env. Is that correct? Is there a better way? submitted by /u/SirRantcelot [link] [comments]  ( 1 min )
    Confused between "centralized critic" and "centralized training decentralized execution"
    I don't understand if having a centralized critic in multi-agent RL is the same as having a centralized training decentralized execution approach. Can you help me clarify this? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    There's a "Yo Mama" joke in their somewhere!!
    submitted by /u/boss_007 [link] [comments]  ( 1 min )
    _Algorithms for Decision Making_, Kochenderfer et al 2022 (textbook draft; more classical ML than S&B)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Solving a large (dynamic) maze using DQN (reward & observation)
    Consider a fixed maze with size (100x400) and fixed start and endpoints (1,1) and (99,399), where doors dynamically appear around the agent every time an observation has been made. The doors disapear after some time step n (for simplicity say n=1). For pitty sake lets say the optimum path length is 1000 steps and there is only one way to reach it. I have two questions: (1) What would be the most appropriate way to frame reward function? I have tested both volatile aproach (i.e. episodeis terminated if agent steps into wall or previosuly made path, otherwise is rewarded in "cheese" every 5 steps) and not-so-volatile approach (i.e. free roam for maximum 2000 steps with cheese reward every 5 steps). Evidently these do not work, are there other perhaps more promising approaches or such large mazes? (2) Does the colour of objects in a frame matter to the (convolutional) DQN's learning of the maze. Considering a 3 channel RGB input, is there perhaps a smarter way to colour code the walls/emptypath/etc.? Its a bit of a weird question but my curiosity arises from short-term memory examples (namely, space invaders) where the input, multiple greyscale frames, is chosen over one rgb frame - so to this degree channels can be understood as higher-level "features" of a frame, hence, can the colours of different objects be optimised? If so is there a logical way of viewing such optimisation? Any other advice is always welcome :) ​ N.B. yes, this could probably be solved with other easier methods but lets just say DQN (or other deep-Q alternatives) is needed. submitted by /u/Background-Cable-491 [link] [comments]  ( 2 min )
    RL for classification problems
    Hello, I aim to use RL in for classification problem, But I can't see where is the difference between using RL and other ML algorithms that are used in classification (such as MLP, KNN, SVM ..) since we have a train phase in which we teach an agent ( or ML algorithm) the classe of each sample of a labeled dataset. Ok the manner of teching is different but the concept is the same. Then, in a second phase, we test the model with a test set. My question is, if I choose RL for classification problem, what is the contribution that we can have compared to another algorithm? submitted by /u/fatenLouati [link] [comments]  ( 2 min )
    Design of next observation when collision for 2d continuous maze
    Hi, I am trying to create a continuous 2d maze environment. I tried several algorithms but no one can give me a stable 1 success rate. From time to time, it gets stuck around the obstacle corner and fails to move anymore. Like the picture shows below. I guess it relates to my bad design for giving the next observation for an action that can't make the agent move forward. Currently, my design evenly separates the action into 10 substeps, and returns the observation which corresponds to the step right before the agent meets an obstacle. It looks like I won't have this issue for mujoco environment like Ant. But it is really hard to see their design. https://preview.redd.it/swohj8h8qev81.png?width=282&format=png&auto=webp&s=7353a62e99ff2141b3660c68ae34ce4f5cd9b94f submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Why can't we make a perfect AI for Starcraft through evolution
    First of all, let's discuss what the level of AI is now. If the "level" refers to the capability of competing, the current AI has been very closed to the top human player in some types of games, like chess, Texas Poker, and Mahjong of CARDS, DOTA2 of MOBA, as well as StarCraft2 of RTS. As for other games, if we have enough human resources and computing performance, we also can get similar results. If the "level" has other meanings, like AI agents having human behavior, intelligent NPC can be designed specifically for different people so that they can have different gaming experience. These are all at the stage of issue-defining and exploring new technology solutions. Although traditional game AI is mostly based on hard code, it still has much prior knowledge. In recent years, some hot ML-r…  ( 4 min )
  • Open

    Interactive Course on Optimizing Search Engines With Ricardo Baeza-Yates Starting May 10
    Sponsored Post Search systems are in the process of being revolutionized by Deep Learning and AI applications. To successfully evaluate, build, deploy and scale information retrieval systems, engineers working with search systems must understand the frameworks and algorithms that underpin this technology. Professors Ricardo Baeza-Yates (Northeastern University) has done research on information retrieval and web […] The post Interactive Course on Optimizing Search Engines With Ricardo Baeza-Yates Starting May 10 appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    India and US have decided to advance cooperation in emerging technologies in the fields of communication, artificial intelligence
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Research on AI!
    Dear all, Currently I am writing my thesis on the effects of Artificial Intelligence (AI) on employee performance through employee engagement. If you work in a company that uses AI and you: - have a direct relationship (e.g. data scientist) or indirect relationship (e.g. business manager) with AI - often or sometimes use AI in your daily work then your insights are essential for my master thesis research and I would really appreciate it if you would fill out the survey below (5-10 min / English & Dutch translation). https://uva.fra1.qualtrics.com/jfe/form/SV_4TTnzJE4IQUoHWK Please feel free to spread the survey to as many relevant people you may know. Much appreciated!! Britt submitted by /u/BrittHermans [link] [comments]  ( 1 min )
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    In the deep
    submitted by /u/Hacknaut [link] [comments]
    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    ​ https://preview.redd.it/beo6czc6hhv81.png?width=1706&format=png&auto=webp&s=385f65ebfed9344781d1f9238d8d508dae27c0a3 In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    [D] MindSpore AI Scientific Computing Series (15): Protein Function Prediction
    submitted by /u/Creative_Habit_6868 [link] [comments]
    Training an AI Hitman To Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Kanye wes t thro the wire
    ​ https://preview.redd.it/qaqevbqlfev81.png?width=1024&format=png&auto=webp&s=8c2759d29a3713c2ffb998e20e8995cee4d8b2b4 submitted by /u/Smek_dev [link] [comments]
  • Open

    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Time Series Analysis for air pollution data not aligned [R] [P]
    This is about a project that I am working on; Hope the ML community can help me! I have collected few hours of air pollutants data using Aeroqual sensors and custom made sensors. 3 types of data is available in the project; aeroqual, custom, council data. Where council data can be taken for granted (It comes from the govt installed high spec sensor). Aeroqual is a commercial sensor manufacturing company, its data should be accurate. The first part of the project is about checking the accuracy of custom sensor. So, I have done few analysis on the data; and found that custom sensor data has similarity (but not same, there are so much variation in the custom sensor data) with council sensor data but aeroqual data is way different. I am attaching the plot below which I have done. ​ So I need to know is there any method that I can find relationship between these three datasets? Is it possible to make these data align togather? I need to build an ML model to predict the air pollutant level using this data. any tips for getting this thing working? - Thanks in advance ​ https://preview.redd.it/fzu7dbsz0dv81.png?width=885&format=png&auto=webp&s=516d65fe3290ac8a28159547880f9dc972922b64 submitted by /u/Codename_17 [link] [comments]  ( 1 min )
    [D] For training a HAAR cascade is it better to manually remove noise from positive training images or to leave it in so the data is more realistic?
    submitted by /u/Counter-Business [link] [comments]  ( 1 min )
    [D] How to convert papers to code?
    My problem is probably what you have guessed: it's understanding the technical specifications which are usually written in a non-coding-friendly way. Sometimes crucial information is completely missing from the paper ex: loss function description for a DL algorithm. For the lucky cases where there are already available implementations on github to a given paper, usually they are either very distinguishable from each other in terms of code structure which questions their validity or whether they match what the paper authors intended specially with varying measurable results, or they are almost exact copies from one another. There are numerous examples where I can show specific papers with varying degrees of complexity, and discuss why the conversion can be tricky but they may require standalone discussions themselves, likely outside the scope of this one. Is there a way to approach the problem assuming the absence of reference code? submitted by /u/shine-box [link] [comments]  ( 2 min )
    Open Source Model For Identifying Extremism Online [Project]
    submitted by /u/OppositeMonday [link] [comments]
    [P] Tired of manually sending minutes of meeting
    I host an important org level meeting (~100 attendees) every week, and need to share minutes after the meeting. I am so tired of listening to conversations again just to capture important points, summarise discussion and action items. Is there any model/api which can help me do that? I use Amazon transcribe to generate transcripts, which helps, but it is not very accurate. For me the priority would be: 1) Model/api which is better than Amazon transcribe 2) Auto Identify speakers / speaker diarization (since mostly the same set of people speak) 3) Summarise the conversations into topics (we have time and agenda based discussion) I am sure this might be a problem across the industry since most of the meetings happen online, and someone wastes hours after meeting to send notes. I did find some tools which summarise the transcript, but i need to auto send in a specific format and identify topics based on conversion (maybe we can input the agenda in advance). Also this is private information, so I need something on premise, hence looking for a repo or model which i can use to build something on top. Please let me know if something exists or someone working on similar projects. Happy to collaborate and contribute. submitted by /u/super_commando-dhruv [link] [comments]  ( 1 min )
    [R] ?? Can you find out which news article is written by AI ??
    This research will test the human ability to distinguish human written text from text generated by artificial intelligence. Participating will only take 10 minutes. You will receive 2 short news articles about the same topic. One will be written by a human, the other one will be generated by artificial intelligence. It is up to you to find out which one is written by artificial intelligence. You will be asked to do this for four different subjects, namely: Science, Economics & Politics, Society and Sports. At the end of the survey you will receive feedback on how well you have performed. The human written articles were collected from various news websites. The Articles created by artificial intelligence were generated using GPT-3 from OpenAI. Purpose of the research: We are trying to find out how well GPT-3 performs across subjects. Are there any subject GPT-3 is better at writing about, or is he equally good across all subjects. Secondly we are testing the ability of GPT-3 to generate articles about events that happened after the training of the model. You can participate by clicking on the link below, thank you very much for your participation. https://vub.fra1.qualtrics.com/jfe/form/SV_b2E9f6hGxNDH13M submitted by /u/RobinSandersVUB [link] [comments]  ( 1 min )
    [R] I need to run >2000 experiments for my PhD work. How much would 2000 GPUs for 1 day cost?
    2000 GPUs and 8000 CPUs. And where could I even get such a vast affordance? submitted by /u/samlerman [link] [comments]  ( 2 min )
    [P] Vectorflow is a minimalist neural network library optimized for sparse data and single machine environments open sourced by Netflix
    submitted by /u/ur_mum_goes_to_uni [link] [comments]
    [Project] Face detection algorithms comparison
    I selected 5 ready-made algorithms for face detection and compared them with each other by such metrics as Precision, Recall, IOU and time on the dataset I marked up. I am ready to accept your Pull Request with your solutions(algorithms) and results! GitHub: https://github.com/wb-08/face-detection-algorithms-comparison Blog post: https://habr.com/ru/post/661671/ submitted by /u/wb-08 [link] [comments]
    [Discussion] Writing production grade code for ML in python
    I have been interviewing for a machine learning lead position. I have successfully passed 3 interview rounds (coding , HR, system design). I have my final interview with the VP of Engineering. When asked how best to prepare myself, they said they would like to test my ability to write "production quality" code in python. While I do have some experience, the downside is I worked in small R&D teams for a long time. Though I am knowledgeable in python, perhaps, I might have not followed all the industry best practices. If you are a hiring manager or interviewer, how would you test this ability? How do I prepare myself to prove my ability to write production grade code? Thank you all so much in advance. submitted by /u/mbkv [link] [comments]  ( 4 min )
    [D] Comparing the efficiency of different GAN models
    I'm comparing different GAN models (CGan, DCGan, WGan, StyleGan) in tensorflow2. In general, I want to use the images that I generate with the generator to train a classifier while being as realistic as possible. At first, I wanted to let them train for 24 hours each, define some early stopping criteria and save the checkpoints with the lowest loss through a callback. But it seems that the lower loss does not always lead to more realistic images. So how do I compare the different models in a scientific way? Because the results highly depend on the epoch I choose and my subjective feeling, which images look the best. submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P], Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    https://www.youtube.com/watch?v=2E_6ARbrMmc submitted by /u/Thenamessd [link] [comments]
    [N] Google's new AI image analysis is pretty LiT - and beats OpenAI's CLIP
    submitted by /u/much_successes [link] [comments]
    [P] A Simpler @PyTorch Annotated Implementation of EleutherAI's 20B Language Model GPT-NeoX.
    Github: https://github.com/labmlai/neox Annotated implementation: https://lit.labml.ai/github/labmlai/neox/tree/main/src/neox/__init__.py Original repo from EleutherAI: https://github.com/EleutherAI/gpt-neox We have included samples showing how to generate text and to fine-tune. We haven't included a bunch of optimizations that were present in original GPT-NeoX to keep things simple. submitted by /u/hnipun [link] [comments]  ( 1 min )
    [P] treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    [D] ‘auton-survival’ package for deep survival analysis and time to event regression from CMU.
    Comes with ‘white paper’ and example notebooks… seems legit..? Anyone tried this out yet? Github Paper] submitted by /u/proportional-hazard [link] [comments]
    [P] Unofficial ViT-VQGAN implementation
    I know that many people (including me) were surprised after seeing the image quality of ViT-VQGAN and disappointed to know there won't be no source code released. Therefore, I've decided to implement it by myself and here is the code. I hope this can help everyone as a starting point for ViT-VQGAN. submitted by /u/ThunaClone [link] [comments]
    [R][P] StyleGAN-Human: A Data-Centric Odyssey of Human Generation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Review of end-to-end multi-modal deep learning approach for autonomous navigation
    In reviewing various approaches to end-to-end deep learning for autonomous driving, I've come across an interesting approach in this paper that I would like to discuss with others... I will begin by summarizing the approach: ​ A ResNet50 architecture is used as an encoder network with the input being an RGB image + depth map concatenated as (224 x 224 x 4). In the paper it is argued that a point cloud can also be used, or some other sensor modality would also work The encoder network output (feature map of 7 x 7 x 2048) is fed into a decoder network that takes it back to (224 x 224 x 5) with pixel wise semantic segmentation of 5 classes: lane, road line, sidewalk, vehicles or pedestrians, and others That same encoder output (feature map of 7 x 7 x 2048) is global average pooled to 2…  ( 2 min )
  • Open

    mGPT: Few-Shot Learners Go Multilingual
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Biological feedback will save us all
    Dall-E-2. Excellent. It's very high quality. But it's a combination of the data. ​ What did we want? We wanted some amazing work that made us cry with just one line of writing or one image. ​ "Oh, copied it well. It's pretty much the same." It's not enough. But how can that be improve? I think the answer is the feedback method. ​ ​ The current evaluation method of writing, image, video, and sound is too indirect. ​ Sales revenue Number of Subscribers Number of views Like / Dislike Ratings by section, Revisit Rate <<< Those are better than others Emotion analysis of Comments using AI Internal staff scores ​ There are so many conditions other than the quality of contents that people's judgment can intervene in. In the first place, people don't express exactly what they…  ( 3 min )
    16 images generated for text prompt "Woah there, Dragonman!" using a text-to-image AI model from CompVis that uses latent diffusion (crosspost of another user's post)
    submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    NVIDIA Instant NeRF: Turn Photos into 3D Scenes in Milliseconds ! Video demo
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    MIT's new machine-learning system M2I may someday help driverless cars predict the next moves of others
    submitted by /u/qptbook [link] [comments]
    Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Human Like AI where should i start
    Hello there, if one would want to get into AI and especially human like AI, would you still recommend getting into machine learning first? As far as i know machine learning doesnt even try to develop "human like" AI/"bottom up AI", but rather focuses on training algorythms to solve specific problems. I know human like AI is something thats highly complex and we still need years if not even decades to achieve something even close to it but i would appreciate tips and ideas nonetheless. (after reading through my question again this sounds like a generic question thats being asked here everyday, if thats the case please send me a link to a similar post if there is one :) ) submitted by /u/Garic152 [link] [comments]  ( 1 min )
    help with a project idea
    Hi everyone Im doing a project with my friends where we should use computer vision/iot to create a solution for people with disabilities or in the healthcare system Any ideas please submitted by /u/armyy__ [link] [comments]
    Meta AI Researchers Built An End-To-End Machine Learning Platform Called Looper, With Easy-To-Use APIs For Decision-Making And Feedback Collection
    From improving the user experience to making the computational infrastructure more effective, AI is a crucial aspect of making current software systems and products perform as well as possible. AI is often more effective than even precisely developed human-crafted heuristic tactics today, whether it’s reducing latency, boosting the quality of a video stream, or streamlining the interfaces to match a specific person’s demands. But, to use AI more effectively in various products, several challenges must be addressed: the system must accommodate software engineers without machine learning backgrounds; it must provide mechanisms to optimize for a variety of product goals, which may differ from closed-form machine learning loss functions; it must distinguish causal connections from data correlations; and it must scale efficiently to train, host, and monitor vast numbers of AI models. Meta Researchers Develop ‘Looper,’ an end-to-end AI platform that has been designed with easy-to-use APIs for optimization, personalization, and feedback collecting to answer these needs. Looper may be used to support the entire machine learning lifecycle, from model training to deployment and inference to product evaluation and optimization. Looper allows us to modify the existing products to leverage AI for personalized optimizations rather than having to rebuild them around AI models. Currently, the Looper platform hosts 700 AI models and produces 4 million AI outputs every second. Continue reading Paper: https://arxiv.org/pdf/2110.07554.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there any programs that can output a sentence based on input sentences?
    I'm looking to create a way to automate original story ideas based on previous ideas. I want to be able to input 1000+ original sentences and have an output of an original sentence that is inspired the previous ones. Are there any programs that can do this or will I need to develop my own? submitted by /u/yea_okay_dude [link] [comments]  ( 1 min )
    Ultimate Guide to Activation Functions
    submitted by /u/SirFletch [link] [comments]
  • Open

    How to stop stable baseline model during the training exactly at the end of frame?
    I am training PPO2 model on stable-baseline library. I have tabular data with 15000 rows, thus length of the episodes is 15000. I am using nminibatches=4, n_envs=1. For example, I have set total_timesteps=10000. During the training process agent will see 15000 rows several times and updates actions for each rows, but in some particular point, the rest of the time total_timesteps will not be enough to see the full episode, and only part of episodes is available in the last step of learning. To be concrete. For simplicity, lets say we have 10 raws, 23 total_timesteps. The agent will see the full episode 2 times, and only the first 3 rows in the third times and rest of the 7 raws have not seen during last step. I want to stop the learning process when Agent reaches the last time full episodes (above example stop learning at when total_timesteps=20) or define total_timesteps in such a way to see full episodes at the end of the training step. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    New to RL
    Hello guys, I am pretty new to the rl field and write now i am doing my thesis in it. I've come across a problem in my code. I created a custom environment and when i am trying to solve it with my dqn agent using stable baselines3, I am able to execute the code and print out the required things but the agent is not learning. Any help ? thanks. submitted by /u/last_2_brain_cells97 [link] [comments]  ( 1 min )
    Questions on policy gradients
    Hi guys, I am new to RL and reading tutorial of spinning up which focus on policy based algorithms. In the derivation of VPG, the tutorial said"The environment has no dependence on /theta(the parameter of policy), so gradients of R(/tau)(total return of the trajectory) with respect of /theta is 0. However, the trajectory depends on our policy, and our policy depends on /theta. As a result, I am confused why total return of trajectory is independent from /theta. submitted by /u/SkyRimT [link] [comments]  ( 2 min )
    Vicarious exits: acquihired by Google robotics (Intrinsic) & DeepMind
    submitted by /u/gwern [link] [comments]
  • Open

    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    I don't understand why I am getting NaN loss scores. Can anyone explain what I am doing wrong ?
    submitted by /u/brike3 [link] [comments]  ( 1 min )
    Are there applications of neural networks other than machine learning?
    I see lots of hardware oriented toward AI/ML stuff these days, including chips with hardware acceleration for neural networks. I'm thinking about how GPUs were initially designed for graphics calculations, but then things like CUDA and OpenCL were developed to make that hardware usable for broader applications of parallel processing. Are there any other things that you can do with a neural network besides backpropagation, that wouldn't be easier to do in other ways? submitted by /u/Bananawamajama [link] [comments]  ( 1 min )
  • Open

    My Paper Reviewing Load
    In academia, for better or worse, we have what’s called a peer review system, where papers get accepted to journals, conferences, or other venues on the basis of reviews from other researchers, who ideally are subject area experts and thus are qualified to evaluate the paper. The reviewers also cannot have a conflict of interest with the authors, and should not be overwhelmed with too many papers to review. This is the ideal world, and is not always what happens in practice. From my experience in the robotics academic community (and this may apply to other disciplines), it generally seems like there is no standard definition of an “appropriate” or “maximum” reviewing load for a reviewer. This is difficult to define as different papers mandate different reviewing efforts; a massive journal …  ( 4 min )
  • Open

    A manifold learning approach for gesture recognition from micro-Doppler radar measurements. (arXiv:2110.01670v4 [cs.LG] UPDATED)
    A recent paper (Neural Networks, {\bf 132} (2020), 253-268) introduces a straightforward and simple kernel based approximation for manifold learning that does not require the knowledge of anything about the manifold, except for its dimension. In this paper, we examine how the pointwise error in approximation using least squares optimization based on similarly localized kernels depends upon the data characteristics and deteriorates as one goes away from the training data. The theory is presented with an abstract localized kernel, which can utilize any prior knowledge about the data being located on an unknown sub-manifold of a known manifold. We demonstrate the performance of our approach using a publicly available micro-Doppler data set, and investigate the use of different preprocessing measures, kernels, and manifold dimensions. Specifically, it is shown that the localized kernel introduced in the above mentioned paper when used with PCA components leads to a near-competitive performance to deep neural networks, and offers significant improvements in training speed and memory requirements. To demonstrate the fact that our methods are agnostic to the domain knowledge, we examine the classification problem in a simple video data set.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    Accurate detection of sepsis at ED triage using machine learning with clinical natural language processing. (arXiv:2204.07657v2 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.
    Visual Attention Methods in Deep Learning: An In-Depth Survey. (arXiv:2204.07756v2 [cs.CV] UPDATED)
    Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.
    On Distribution Shift in Learning-based Bug Detectors. (arXiv:2204.10049v1 [cs.LG])
    Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g. >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our constructed test set and the latest version of open source repositories.
    Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion. (arXiv:2204.07741v2 [cs.HC] UPDATED)
    Persuading people to change their opinions is a common practice in online discussion forums on topics ranging from political campaigns to relationship consultation. Enhancing people's ability to write persuasive arguments could not only practice their critical thinking and reasoning but also contribute to the effectiveness and civility in online communication. It is, however, not an easy task in online discussion settings where written words are the primary communication channel. In this paper, we derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions through a survey with 123 online forum users and interviews with five debating experts. To satisfy these design goals, we analyzed and built a labeled dataset of fine-grained persuasive strategies (i.e., logos, pathos, ethos, and evidence) in 164 arguments with high ratings on persuasiveness from ChangeMyView, a popular online discussion forum. We then designed an interactive visual system, Persua, which provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments. In particular, the system constructs portfolios of arguments based on different persuasive strategies applied to a given discussion topic. It then presents concrete examples based on the difference between the portfolios of user input and high-quality arguments in the dataset. A between-subjects study shows suggestive evidence that Persua encourages users to submit more times for feedback and helps users improve more on the persuasiveness of their arguments than a baseline system. Finally, a set of design considerations was summarized to guide future intelligent systems that improve the persuasiveness in text.
    Learning to Hash Naturally Sorts. (arXiv:2201.13322v2 [cs.CV] UPDATED)
    Learning to hash pictures a list-wise sorting problem. Its testing metrics, e.g., mean-average precision, count on a sorted candidate list ordered by pair-wise code similarity. However, scarcely does one train a deep hashing model with the sorted results end-to-end because of the non-differentiable nature of the sorting operation. This inconsistency in the objectives of training and test may lead to sub-optimal performance since the training loss often fails to reflect the actual retrieval metric. In this paper, we tackle this problem by introducing Naturally-Sorted Hashing (NSH). We sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. Thanks to the recent advances in differentiable sorting approximations, the hash head receives gradients from the sorter so that the hash encoder can be optimized along with the training procedure. Additionally, we describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning, which allows NSH to mine data semantic relations during training in an unsupervised manner. Our extensive experiments show the proposed NSH model significantly outperforms the existing unsupervised hashing methods on three benchmarked datasets.
    Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets. (arXiv:2109.13514v2 [cs.CV] UPDATED)
    Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Deep learning techniques for energy clustering in the CMS ECAL. (arXiv:2204.10277v1 [hep-ex])
    The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses. Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise, helping to preserve the electron and photon energy resolution achieved during LHC Runs 1 and 2. This work will cover the challenges of training the models as well the opportunity that this new approach offers to unify the ECAL energy measurement with the particle identification steps used in the global CMS photon and electron reconstruction.
    Condition Monitoring of Transformer Bushings Using Computational Intelligence. (arXiv:2204.10193v1 [cs.LG])
    Dissolved Gas-in-oil analysis (DGA) is used to monitor the condition of bushings on large power transformers. There are different techniques used in determining the conditions from the data collected, but in this work the Artificial Intelligence techniques are investigated. This work investigates which gases in DGA are related to each other and which ones are important for making decisions. When the related and crucial gases are determined, the other gases are discarded thereby reducing the number of attributes in DGA. Hence a further investigation is done to see how these new datasets influence the performance of the classifiers used to classify the DGA of full attributes. The classifiers used in these experiments were Backpropagation Neural Networks (BPNN) and Support Vector Machines (SVM) whereas the Principal Component Analysis (PCA), Rough Set (RS), Incremental Granular Ranking (GR++) and Decision Trees (DT) were used to reduce the attributes of the dataset. The parameters used when training the BPNN and SVM classifiers are kept fixed to create a controlled test environment when investigating the effects of reducing the number of gases. This work further introduced a new classifier that can handle high dimension dataset and noisy dataset, Rough Neural Network (RNN).
    Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions. (arXiv:2203.12235v2 [cs.CL] UPDATED)
    The syntactic categories of categorial grammar formalisms are structured units made of smaller, indivisible primitives, bound together by the underlying grammar's category formation rules. In the trending approach of constructive supertagging, neural models are increasingly made aware of the internal category structure, which in turn enables them to more reliably predict rare and out-of-vocabulary categories, with significant implications for grammars previously deemed too complex to find practical use. In this work, we revisit constructive supertagging from a graph-theoretic perspective, and propose a framework based on heterogeneous dynamic graph convolutions aimed at exploiting the distinctive structure of a supertagger's output space. We test our approach on a number of categorial grammar datasets spanning different languages and grammar formalisms, achieving substantial improvements over previous state of the art scores. Code will be made available at https://github.com/konstantinosKokos/dynamic-graph-supertagging
    Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks. (arXiv:2204.09942v1 [cs.CR])
    Industrial control systems (ICSs) are facing increasing cyber-physical attacks that can cause catastrophes in the physical system. Efficient anomaly detection models in the industrial sensor networks are essential for enhancing ICS reliability and security, due to the sensor data is related to the operational state of the ICS. Considering the limited availability of computing resources, this paper proposes a hybrid anomaly detection approach in cloud-edge collaboration industrial sensor networks. The hybrid approach consists of sensor data detection models deployed at the edges and a sensor data analysis model deployed in the cloud. The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load. The sensor data analysis model based on Graph convolutional network, Residual algorithm and Long short-term memory network (GCRL) can effectively extract the spatial and temporal features and then identify the attack precisely. The proposed hybrid anomaly detection approach is evaluated using a benchmark dataset and baseline anomaly detection models. The experimental results show that the proposed approach can achieve an overall 11.19% increase in Recall and an impressive 14.29% improvement in F1-score, compared with the existing models.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.
    A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms. (arXiv:2204.09825v1 [cs.LG])
    Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.
    Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. (arXiv:2204.09781v1 [cs.DL])
    The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.
    Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. (arXiv:2204.09831v1 [physics.chem-ph])
    We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
    Memory Bounds for the Experts Problem. (arXiv:2204.09837v1 [cs.DS])
    Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of $n$ "experts" who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set. The classical algorithm for this problem is the multiplicative weights algorithm. However, every application, to our knowledge, relies on storing weights for every expert, and uses $\Omega(n)$ memory. There is little work on understanding the memory required to solve the online learning with expert advice problem, or run standard sequential prediction algorithms, in natural streaming models, which is especially important when the number of experts, as well as the number of days on which the experts make predictions, is large. We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem using a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show novel ways to run standard sequential prediction algorithms in rounds on small "pools" of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. (arXiv:2204.09934v1 [eess.AS])
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.
    Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation. (arXiv:2204.09975v1 [cs.LG])
    Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs).Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i.e., attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between both teacher and student models during knowledge distillation, ARGD can eradicate more backdoor triggers than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94.85% reduction in ASR, while ACC can be improved by up to 3.23%.
    Social Media Sentiment Analysis for Cryptocurrency Market Prediction. (arXiv:2204.10185v1 [cs.CL])
    In this paper, we explore the usability of different natural language processing models for the sentiment analysis of social media applied to financial market prediction, using the cryptocurrency domain as a reference. We study how the different sentiment metrics are correlated with the price movements of Bitcoin. For this purpose, we explore different methods to calculate the sentiment metrics from a text finding most of them not very accurate for this prediction task. We find that one of the models outperforms more than 20 other public ones and makes it possible to fine-tune it efficiently given its interpretable nature. Thus we confirm that interpretable artificial intelligence and natural language processing methods might be more valuable practically than non-explainable and non-interpretable ones. In the end, we analyse potential causal connections between the different sentiment metrics and the price movements.
    FedCL: Federated Contrastive Learning for Privacy-Preserving Recommendation. (arXiv:2204.09850v1 [cs.LG])
    Contrastive learning is widely used for recommendation model learning, where selecting representative and informative negative samples is critical. Existing methods usually focus on centralized data, where abundant and high-quality negative samples are easy to obtain. However, centralized user data storage and exploitation may lead to privacy risks and concerns, while decentralized user data on a single client can be too sparse and biased for accurate contrastive learning. In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected. We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise, and the cluster centroids are used to retrieve hard negative samples from the item pool. These hard negative samples are delivered to user clients and mixed with the observed negative samples from local data as well as in-batch negatives constructed from positive samples for federated model training. Extensive experiments on four benchmark datasets show FedCL can empower various recommendation methods in a privacy-preserving way.
    Adversarial Contrastive Learning by Permuting Cluster Assignments. (arXiv:2204.10314v1 [cs.LG])
    Contrastive learning has gained popularity as an effective self-supervised representation learning technique. Several research directions improve traditional contrastive approaches, e.g., prototypical contrastive methods better capture the semantic similarity among instances and reduce the computational burden by considering cluster prototypes or cluster assignments, while adversarial instance-wise contrastive methods improve robustness against a variety of attacks. To the best of our knowledge, no prior work jointly considers robustness, cluster-wise semantic similarity and computational efficiency. In this work, we propose SwARo, an adversarial contrastive framework that incorporates cluster assignment permutations to generate representative adversarial samples. We evaluate SwARo on multiple benchmark datasets and against various white-box and black-box attacks, obtaining consistent improvements over state-of-the-art baselines.
    TND-NAS: Towards Non-Differentiable Objectives in Differentiable Neural Architecture Search. (arXiv:2111.03892v2 [cs.LG] UPDATED)
    Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its high efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption. However, these methods are no longer naturally capable of tackling the non-differentiable objectives, e.g., energy, resource-constrained efficiency, and other metrics, let alone the multi-objective search demands. Researches in the multi-objective NAS field target this but requires vast computational resources cause of the sole optimization of each candidate architecture. In light of this discrepancy, we propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS. Under the differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the progressive search space shrinking by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3\%, 2.4M/2.95\%, 9.57M/2.54\%) and CIFAR100 (2.46M/18.3\%, 5.46/16.73\%, 12.88/15.20\%) datasets. Favorably, compared with other multi-objective NAS methods, TND-NAS is less time-consuming (1.3 GPU-days on NVIDIA 1080Ti, 1/6 of that in NSGA-Net), and can be conveniently adapted to real-world NAS scenarios (resource-constrained, platform-specialized).
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.  ( 2 min )
    Surfer100: Generating Surveys From Web Resources on Wikipedia-style. (arXiv:2112.06377v2 [cs.CL] UPDATED)
    Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.  ( 2 min )
    Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity. (arXiv:2111.05329v3 [cs.CV] UPDATED)
    We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard synchronous cross-modal relations, CrissCross also learns asynchronous cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong generalized representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50 and DCASE). The codes and pretrained models will be made publicly available.  ( 2 min )
    NICO++: Towards Better Benchmarking for Domain Generalization. (arXiv:2204.08040v2 [cs.CV] UPDATED)
    Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
    Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms. (arXiv:2204.09828v1 [cs.NE])
    Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
    DeepGate: Learning Neural Representations of Logic Gates. (arXiv:2111.14616v3 [cs.LG] UPDATED)
    Applying deep learning (DL) techniques in the electronic design automation (EDA) field has become a trending topic. Most solutions apply well-developed DL models to solve specific EDA problems. While demonstrating promising results, they require careful model tuning for every problem. The fundamental question on "How to obtain a general and effective neural representation of circuits?" has not been answered yet. In this work, we take the first step towards solving this problem. We propose DeepGate, a novel representation learning solution that effectively embeds both logic function and structural information of a circuit as vectors on each gate. Specifically, we propose transforming circuits into unified and-inverter graph format for learning and using signal probabilities as the supervision task in DeepGate. We then introduce a novel graph neural network that uses strong inductive biases in practical circuits as learning priors for signal probability prediction. Our experimental results show the efficacy and generalization capability of DeepGate.
    BTranspose: Bottleneck Transformers for Human Pose Estimation with Self-Supervised Pre-Training. (arXiv:2204.10209v1 [cs.LG])
    The task of 2D human pose estimation is challenging as the number of keypoints is typically large (~ 17) and this necessitates the use of robust neural network architectures and training pipelines that can capture the relevant features from the input image. These features are then aggregated to make accurate heatmap predictions from which the final keypoints of human body parts can be inferred. Many papers in literature use CNN-based architectures for the backbone, and/or combine it with a transformer, after which the features are aggregated to make the final keypoint predictions [1]. In this paper, we consider the recently proposed Bottleneck Transformers [2], which combine CNN and multi-head self attention (MHSA) layers effectively, and we integrate it with a Transformer encoder and apply it to the task of 2D human pose estimation. We consider different backbone architectures and pre-train them using the DINO self-supervised learning method [3], this pre-training is found to improve the overall prediction accuracy. We call our model BTranspose, and experiments show that on the COCO validation set, our model achieves an AP of 76.4, which is competitive with other methods such as [1] and has fewer network parameters. Furthermore, we also present the dependencies of the final predicted keypoints on both the MHSA block and the Transformer encoder layers, providing clues on the image sub-regions the network attends to at the mid and high levels.
    Understanding the Domain Gap in LiDAR Object Detection Networks. (arXiv:2204.10024v1 [cs.CV])
    In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood. In this work, we investigate the domain gaps between high-resolution and low-resolution LiDAR sensors in object detection networks. Using a unique dataset, which enables us to study sensor resolution domain gaps independent of other effects, we show two distinct domain gaps - an inference domain gap and a training domain gap. The inference domain gap is characterised by a strong dependence on the number of LiDAR points per object, while the training gap shows no such dependence. These fndings show that different approaches are required to close these inference and training domain gaps.
    Towards Reliable Neural Generative Modeling of Detectors. (arXiv:2204.09947v1 [physics.ins-det])
    The increasing luminosities of future data taking at Large Hadron Collider and next generation collider experiments require an unprecedented amount of simulated events to be produced. Such large scale productions demand a significant amount of valuable computing resources. This brings a demand to use new approaches to event generation and simulation of detector responses. In this paper, we discuss the application of generative adversarial networks (GANs) to the simulation of the LHCb experiment events. We emphasize main pitfalls in the application of GANs and study the systematic effects in detail. The presented results are based on the Geant4 simulation of the LHCb Cherenkov detector.
    Conditional entropy minimization principle for learning domain invariant representation features. (arXiv:2201.10460v3 [cs.LG] UPDATED)
    Invariance principle-based methods, for example, Invariant Risk Minimization (IRM), have recently emerged as promising approaches for Domain Generalization (DG). Despite the promising theory, invariance principle-based approaches fail in common classification tasks due to the mixture of the true invariant features and the spurious invariant features. In this paper, we propose a framework based on the conditional entropy minimization principle to filter out the spurious invariant features leading to a new algorithm with a better generalization capability. We theoretically prove that under some particular assumptions, the representation function can precisely recover the true invariant features. In addition, we also show that the proposed approach is closely related to the well-known Information Bottleneck (IB) framework. Both the theoretical and numerical results are provided to justify our approach.
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.
    Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation. (arXiv:2204.08647v3 [cs.RO] UPDATED)
    For autonomous quadruped robot navigation in various complex environments, a typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller -- in a hierarchical manner. In this paper, we build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner. Previous works used waypoint-based methods (e.g. Proportional-Differential control and pure pursuit) which simplify the path tracking problem to local point-goal navigation. However, they suffer from frequent collisions in geometrically complex and narrow environments because of two reasons; the global planner uses a coarse and inaccurate model and the local planner is unable to track the global plan sufficiently well. Currently, deep learning methods are an appealing alternative because they can learn safety and path feasibility from experience more accurately. However, existing deep learning methods are not capable of planning for a long horizon. In this work, we propose a learning-based fully autonomous navigation framework composed of three innovative elements: a learned forward dynamics model (FDM), an online sampling-based model-predictive controller, and an informed trajectory sampler (ITS). Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method. Furthermore, our method can reactively handle unexpected obstacles on the planned path and avoid them. Project page https://awesomericky.github.io/projects/FDM_ITS_navigation/.
    Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. (arXiv:2204.08454v3 [cs.CV] UPDATED)
    Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires pixel-wise comparisons by a human expert. On the other hand, we often have access to unlimited unlabeled multi-temporal RS imagery thanks to ever-increasing earth observation programs. In this paper, we propose a simple yet effective way to leverage the information from unlabeled bi-temporal images to improve the performance of CD approaches. More specifically, we propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss by constraining the output change probability map of a given unlabeled bi-temporal image pair to be consistent under the small random perturbations applied on the deep feature difference map that is obtained by subtracting their latent feature representations. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD even with access to as little as 10% of the annotated training data. Code available at https://github.com/wgcban/SemiCD
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.
    Debiased Learning from Naturally Imbalanced Pseudo-Labels. (arXiv:2201.01490v2 [cs.LG] UPDATED)
    Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.
    Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation. (arXiv:2204.10020v1 [eess.AS])
    Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is challenging to learn a stable VC model because the amount of data is limited in low-resource scenarios, and highly expressive speech has large acoustic variety. To address this issue, we propose a novel data augmentation method that combines pitch-shifting and VC techniques. Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1,000 utterances of the target speaker's neutral data are available. Subjective test results showed that a FastSpeech 2-based emotional TTS system with the proposed method improved naturalness and emotional similarity compared with conventional methods.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples: An Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v5 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach that only combines machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.  ( 2 min )
    Dynamical simulation via quantum machine learning with provable generalization. (arXiv:2204.10269v1 [quant-ph])
    Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.  ( 2 min )
    Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization. (arXiv:2204.10231v1 [cs.LG])
    Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyperparameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower level determines the support vectors and the upper level the hyperparameters. This optimization problem is solved using an evolutionary algorithm (EA) at the upper level and sequential minimal optimization (SMO) at the lower level. These two methods work in a nested fashion, that is, the optimal support vectors help guide the search of the hyperparameters, and the lower level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.  ( 2 min )
    DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color Semantic Segmentation. (arXiv:2204.10266v1 [cs.LG])
    In this paper we present a new approach for feature fusion between RGB and LWIR Thermal images for the task of semantic segmentation for driving perception. We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities and a shared decoder for final segmentation. We combine two strategies for feature fusion: confidence weighting and correlation weighting. We report state-of-the-art mean IoU results on the MF dataset.  ( 2 min )
    Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications. (arXiv:2202.08916v3 [eess.IV] UPDATED)
    Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.  ( 2 min )
    Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers. (arXiv:2204.10125v1 [cs.SD])
    Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
    OCTOPUS -- optical coherence tomography plaque and stent analysis software. (arXiv:2204.10212v1 [eess.IV])
    Compared with other imaging modalities, intravascular optical coherence tomography (IVOCT) has significant advantages for guiding percutaneous coronary interventions. To aid IVOCT research studies, we developed the Optical Coherence TOmography PlaqUe and Stent (OCTOPUS) analysis software. To automate image analysis results, the software includes several important algorithmic steps: pre-processing, deep learning plaque segmentation, machine learning identification of stent struts, and registration of pullbacks. Interactive visualization and manual editing of segmentations were included in the software. Quantifications include stent deployment characteristics (e.g., stent strut malapposition), strut level analysis, calcium angle, and calcium thickness measurements. Interactive visualizations include (x,y) anatomical, en face, and longitudinal views with optional overlays. Underlying plaque segmentation algorithm yielded excellent pixel-wise results (86.2% sensitivity and 0.781 F1 score). Using OCTOPUS on 34 new pullbacks, we determined that following automated segmentation, only 13% and 23% of frames needed any manual touch up for detailed lumen and calcification labeling, respectively. Only up to 3.8% of plaque pixels were modified, leading to an average editing time of only 7.5 seconds/frame, an approximately 80% reduction compared to manual analysis. Regarding stent analysis, sensitivity and precision were both greater than 90%, and each strut was successfully classified as either covered or uncovered with high sensitivity (94%) and specificity (90%). We introduced and evaluated the clinical application of a highly automated software package, OCTOPUS, for quantitative plaque and stent analysis in IVOCT images. The software is currently used as an offline tool for research purposes; however, the software's embedded algorithms may also be useful for real-time treatment planning.  ( 2 min )
    Sketch2PQ: Freeform Planar Quadrilateral Mesh Design via a Single Sketch. (arXiv:2201.09367v3 [cs.GR] UPDATED)
    The freeform architectural modeling process often involves two important stages: concept design and digital modeling. In the first stage, architects usually sketch the overall 3D shape and the panel layout on a physical or digital paper briefly. In the second stage, a digital 3D model is created using the sketch as a reference. The digital model needs to incorporate geometric requirements for its components, such as the planarity of panels due to consideration of construction costs, which can make the modeling process more challenging. In this work, we present a novel sketch-based system to bridge the concept design and digital modeling of freeform roof-like shapes represented as planar quadrilateral (PQ) meshes. Our system allows the user to sketch the surface boundary and contour lines under axonometric projection and supports the sketching of occluded regions. In addition, the user can sketch feature lines to provide directional guidance to the PQ mesh layout. Given the 2D sketch input, we propose a deep neural network to infer in real-time the underlying surface shape along with a dense conjugate direction field, both of which are used to extract the final PQ mesh. To train and validate our network, we generate a large synthetic dataset that mimics architect sketching of freeform quadrilateral patches. The effectiveness and usability of our system are demonstrated with quantitative and qualitative evaluation as well as user studies.  ( 2 min )
    Assessing Machine Learning Algorithms for Near-Real Time Bus Ridership Prediction During Extreme Weather. (arXiv:2204.09792v1 [stat.AP])
    Given an increasingly volatile climate, the relationship between weather and transit ridership has drawn increasing interest. However, challenges stemming from spatio-temporal dependency and non-stationarity have not been fully addressed in modelling and predicting transit ridership under the influence of weather conditions especially with the traditional statistical approaches. Drawing on three-month smart card data in Brisbane, Australia, this research adopts and assesses a suite of machine-learning algorithms, i.e., random forest, eXtreme Gradient Boosting (XGBoost) and Tweedie XGBoost, to model and predict near real-time bus ridership in relation to sudden change of weather conditions. The study confirms that there indeed exists a significant level of spatio-temporal variability of weather-ridership relationship, which produces equally dynamic patterns of prediction errors. Further comparison of model performance suggests that Tweedie XGBoost outperforms the other two machine-learning algorithms in generating overall more accurate prediction outcomes in space and time. Future research may advance the current study by drawing on larger data sets and applying more advanced machine and deep-learning approaches to provide more enhanced evidence for real-time operation of transit systems.  ( 2 min )
    The 2021 NIST Speaker Recognition Evaluation. (arXiv:2204.10242v1 [eess.AS])
    The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996. It was the second large-scale multimodal speaker/person recognition evaluation organized by NIST (the first one being SRE19). Similar to SRE19, it featured two core evaluation tracks, namely audio and audio-visual, as well as an optional visual track. In addition to offering fixed and open training conditions, it also introduced new challenges for the community, thanks to a new multimodal (i.e., audio, video, and selfie images) and multilingual (i.e., with multilingual speakers) corpus, termed WeCanTalk, collected outside North America by the Linguistic Data Consortium (LDC). These challenges included: 1) trials (target and non-target) with enrollment and test segments originating from different domains (i.e., telephony versus video), and 2) trials (target and non-target) with enrollment and test segments spoken in different languages (i.e., cross-lingual trials). This paper presents an overview of SRE21 including the tasks, performance metric, data, evaluation protocol, results and system performance analyses. A total of 23 organizations (forming 15 teams) from academia and industry participated in SRE21 and submitted 158 valid system outputs. Evaluation results indicate: audio-visual fusion produce substantial gains in performance over audio-only or visual-only systems; top performing speaker and face recognition systems exhibited comparable performance under the matched domain conditions present in this evaluation; and, the use of complex neural network architectures (e.g., ResNet) along with angular losses with margin, data augmentation, as well as long duration fine-tuning contributed to notable performance improvements for the audio-only speaker recognition task.  ( 2 min )
    Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift. (arXiv:2204.08816v3 [astro-ph.GA] UPDATED)
    In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.  ( 2 min )
    Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems. (arXiv:2203.11758v2 [math.OC] UPDATED)
    Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for general continuous space-time stochastic control problems has been elusive. This paper closes the gap by proposing a proximal gradient algorithm for feedback controls of finite-time horizon stochastic control problems. The state dynamics are continuous time nonlinear diffusions with controlled drift and possibly degenerate noise, and the objectives are nonconvex in the state and nonsmooth in the control. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.  ( 2 min )
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.
    Fink: early supernovae Ia classification using active learning. (arXiv:2111.11438v2 [astro-ph.IM] UPDATED)
    We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
    Path sampling of recurrent neural networks by incorporating known physics. (arXiv:2203.00597v2 [cond-mat.dis-nn] UPDATED)
    Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.
    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
    An Improved Transfer Model: Randomized Transferable Machine. (arXiv:2011.13629v2 [cs.LG] UPDATED)
    Feature-based transfer is one of the most effective methodologies for transfer learning. Existing studies usually assume that the learned new feature representation is \emph{domain-invariant}, and thus train a transfer model $\mathcal{M}$ on the source domain. In this paper, we consider a more realistic scenario where the new feature representation is suboptimal and small divergence still exists across domains. We propose a new transfer model called Randomized Transferable Machine (RTM) to handle such small divergence of domains. Specifically, we work on the new source and target data learned from existing feature-based transfer methods. The key idea is to enlarge source training data populations by randomly corrupting the new source data using some noises, and then train a transfer model $\widetilde{\mathcal{M}}$ that performs well on all the corrupted source data populations. In principle, the more corruptions are made, the higher the probability of the new target data can be covered by the constructed source data populations, and thus better transfer performance can be achieved by $\widetilde{\mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We develop a marginalized solution that enables to train an $\widetilde{\mathcal{M}}$ without conducting any corruption but equivalent to be trained using infinite source noisy data populations. We further propose two instantiations of $\widetilde{\mathcal{M}}$, which theoretically show the transfer superiority over the conventional transfer model $\mathcal{M}$. More importantly, both instantiations have closed-form solutions, leading to a fast and efficient training process. Experiments on various real-world transfer tasks show that RTM is a promising transfer model.
    Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective. (arXiv:2103.03113v3 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
    Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface. (arXiv:2107.06393v2 [cs.CV] UPDATED)
    Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Anti-Jamming Games in Multi-Band Wireless Ad Hoc Networks. (arXiv:2111.11178v2 [cs.IT] UPDATED)
    For multi-band wireless ad hoc networks of multiple users, an anti-jamming game between the users and a jammer is studied. In this game, the users (resp. jammer) want to maximize (resp. minimize) the expected rewards of the users taking into account various factors such as communication rate, hopping cost, and jamming loss. We analyze the arms race of the game and derive an optimal frequency hopping policy at each stage of the arms race based on the Markov decision process (MDP). It is analytically shown that the arms race reaches an equilibrium after a few rounds, and a frequency hopping policy and a jamming strategy at the equilibrium are characterized. We propose two kinds of collision avoidance protocols to ensure that at most one user communicates in each frequency band, and provide various numerical results that show the effects of the reward parameters and collision avoidance protocols on the optimal frequency hopping policy and the expected rewards at the equilibrium. Moreover, we discuss about equilibria for the case where the jammer adopts some unpredictable jamming strategies.
    Distributed Learning for Vehicular Dynamic Spectrum Access in Autonomous Driving. (arXiv:2204.10179v1 [cs.NI])
    Reliable wireless communication between the autonomously driving cars is one of the fundamental needs for guaranteeing passenger safety and comfort. However, when the number of communicating cars increases, the transmission quality may be significantly degraded due to too high occupancy radio of the used frequency band. In this paper, we concentrate on the autonomous vehicle-platooning use-case, where intra-platoon communication is done in the dynamically selected frequency band, other than nominally devoted for such purposes. The carrier selection is done in a flexible manner with the support of the context database located at the roadside unit (edge of wireless communication infrastructure). However, as the database delivers only context information to the platoons' leaders, the final decision is made separately by the individual platoons, following the suggestions made by the artificial intelligence algorithms. In this work, we concentrate on a lightweight Q-learning solution, that could be successfully implemented in each car for dynamic channel selection.
    Merging of neural networks. (arXiv:2204.09973v1 [cs.LG])
    We propose a simple scheme for merging two neural networks trained with different starting initialization into a single one with the same size as the original ones. We do this by carefully selecting channels from each input network. Our procedure might be used as a finalization step after one tries multiple starting seeds to avoid an unlucky one. We also show that training two networks and merging them leads to better performance than training a single network for an extended period of time. Availability: https://github.com/fmfi-compbio/neural-network-merging
    OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks. (arXiv:2202.13799v2 [cs.CV] UPDATED)
    We propose OUR-GAN, the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image. OUR-GAN generates a visually coherent image at low resolution and then gradually increases the resolution by super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that generate large images based on the patch distribution learned from relatively small images. OUR-GAN applies seamless subregion-wise super-resolution that synthesizes 4k or higher UHR images with limited memory, preventing discontinuity at the boundary. Additionally, OUR-GAN improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with existing methods. The synthesized images are presented at https://anonymous-62348.github.io.
    A System for Interactive Examination of Learned Security Policies. (arXiv:2204.01126v2 [cs.CR] UPDATED)
    We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
    INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs. (arXiv:2204.10184v1 [cs.NI])
    WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.
    BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern Analysis. (arXiv:2204.05746v2 [cs.CR] UPDATED)
    Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models on our proposed dataset are between 93.24% and 96.71%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation).
    How Well Do Sparse Imagenet Models Transfer?. (arXiv:2111.13445v5 [cs.CV] UPDATED)
    Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned - that is, compressed by sparsifying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Energy-Efficient Parking Analytics System using Deep Reinforcement Learning. (arXiv:2202.08973v2 [cs.CV] UPDATED)
    Advances in deep vision techniques and ubiquity of smart cameras will drive the next generation of video analytics. However, video analytics applications consume vast amounts of energy as both deep learning techniques and cameras are power-hungry. In this paper, we focus on a parking video analytics platform and propose RL-CamSleep, a deep reinforcement learning-based technique, to actuate the cameras to reduce the energy footprint while retaining the system's utility. Our key insight is that many video-analytics applications do not always need to be operational, and we can design policies to activate video analytics only when necessary. Moreover, our work is complementary to existing work that focuses on improving hardware and software efficiency. We evaluate our approach on a city-scale parking dataset having 76 streets spread across the city. Our analysis demonstrates how streets have various parking patterns, highlighting the importance of an adaptive policy. Our approach can learn such an adaptive policy that can reduce the average energy consumption by 76.38% and achieve an average accuracy of more than 98% in performing video analytics.
    The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. (arXiv:2110.07732v3 [cs.LG] UPDATED)
    Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
    On the Certified Robustness for Ensemble Models and Beyond. (arXiv:2107.10873v2 [cs.LG] UPDATED)
    Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.  ( 2 min )
    A Survey and Perspective on Artificial Intelligence for Security-Aware Electronic Design Automation. (arXiv:2204.09579v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) and machine learning (ML) techniques have been increasingly used in several fields to improve performance and the level of automation. In recent years, this use has exponentially increased due to the advancement of high-performance computing and the ever increasing size of data. One of such fields is that of hardware design; specifically the design of digital and analog integrated circuits~(ICs), where AI/ ML techniques have been extensively used to address ever-increasing design complexity, aggressive time-to-market, and the growing number of ubiquitous interconnected devices (IoT). However, the security concerns and issues related to IC design have been highly overlooked. In this paper, we summarize the state-of-the-art in AL/ML for circuit design/optimization, security and engineering challenges, research in security-aware CAD/EDA, and future research directions and needs for using AI/ML for security-aware circuit design.  ( 2 min )
    ESS: Learning Event-based Semantic Segmentation from Still Images. (arXiv:2203.10016v1 [cs.CV] CROSS LISTED)
    Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the novelty of the sensor, and the lack of high-quality, labeled datasets. In this work, we introduce ESS, which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, to spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.  ( 2 min )
    Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism. (arXiv:1811.02117v2 [cs.SI] UPDATED)
    An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in a wide range of domains. Here we propose a deep learning attention mechanism to model the process through which individual items gain their popularity. We analyze the interpretability of the model with the four key phenomena confirmed independently in the previous studies of long-term popularity dynamics quantification, including the intrinsic quality, the aging effect, the recency effect and the Matthew effect. We analyze the effectiveness of introducing attention model in popularity dynamics prediction. Extensive experiments on a real-large citation data set demonstrate that the designed deep learning attention mechanism possesses remarkable power at predicting the long-term popularity dynamics. It consistently outperforms the existing methods, and achieves a significant performance improvement.  ( 2 min )
    Deep Bayesian Active Learning, A Brief Survey on Recent Advances. (arXiv:2012.08044v2 [cs.LG] UPDATED)
    Active learning frameworks offer efficient data annotation without remarkable accuracy degradation. In other words, active learning starts training the model with a small size of labeled data while exploring the space of unlabeled data in order to select most informative samples to be labeled. Generally speaking, representing the uncertainty is crucial in any active learning framework, however, deep learning methods are not capable of either representing or manipulating model uncertainty. On the other hand, from the real world application perspective, uncertainty representation is getting more and more attention in the machine learning community. Deep Bayesian active learning frameworks and generally any Bayesian active learning settings, provide practical consideration in the model which allows training with small data while representing the model uncertainty for further efficient training. In this paper, we briefly survey recent advances in Bayesian active learning and in particular deep Bayesian active learning frameworks.
    Addressing Tactic Volatility in Self-Adaptive Systems Using Evolved Recurrent Neural Networks and Uncertainty Reduction Tactics. (arXiv:2204.10308v1 [cs.LG])
    Self-adaptive systems frequently use tactics to perform adaptations. Tactic examples include the implementation of additional security measures when an intrusion is detected, or activating a cooling mechanism when temperature thresholds are surpassed. Tactic volatility occurs in real-world systems and is defined as variable behavior in the attributes of a tactic, such as its latency or cost. A system's inability to effectively account for tactic volatility adversely impacts its efficiency and resiliency against the dynamics of real-world environments. To enable systems' efficiency against tactic volatility, we propose a Tactic Volatility Aware (TVA-E) process utilizing evolved Recurrent Neural Networks (eRNN) to provide accurate tactic predictions. TVA-E is also the first known process to take advantage of uncertainty reduction tactics to provide additional information to the decision-making process and reduce uncertainty. TVA-E easily integrates into popular adaptation processes enabling it to immediately benefit a large number of existing self-adaptive systems. Simulations using 52,106 tactic records demonstrate that: I) eRNN is an effective prediction mechanism, II) TVA-E represents an improvement over existing state-of-the-art processes in accounting for tactic volatility, and III) Uncertainty reduction tactics are beneficial in accounting for tactic volatility. The developed dataset and tool can be found at https://tacticvolatility.github.io/
    Lessons on Parameter Sharing across Layers in Transformers. (arXiv:2104.06022v3 [cs.CL] UPDATED)
    We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.  ( 2 min )
    DropMessage: Unifying Random Dropping for Graph Neural Networks. (arXiv:2204.10037v1 [cs.LG])
    Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also faces some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate noises into models by randomly masking parts of the input. However, some open-ended problems of random dropping on GNNs remain to solve. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, random noises introduced to GNNs cause the incomplete coverage of parameters and unstable training process. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the message matrix and can be applied to any message-passing GNNs. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, which makes it a theoretical upper bound of other methods. Also, we unify existing random dropping methods into our framework and analyze their effects on GNNs. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has both advantages of effectiveness and generalization.  ( 2 min )
    STONet: A Neural-Operator-Driven Spatio-temporal Network. (arXiv:2204.08414v2 [cs.LG] UPDATED)
    Graph-based spatio-temporal neural networks are effective to model the spatial dependency among discrete points sampled irregularly from unstructured grids, thanks to the great expressiveness of graph neural networks. However, these models are usually spatially-transductive -- only fitting the signals for discrete spatial nodes fed in models but unable to generalize to `unseen' spatial points with zero-shot. In comparison, for forecasting tasks on continuous space such as temperature prediction on the earth's surface, the \textit{spatially-inductive} property allows the model to generalize to any point in the spatial domain, demonstrating models' ability to learn the underlying mechanisms or physics laws of the systems, rather than simply fit the signals. Besides, in temporal domains, \textit{irregularly-sampled} time series, e.g. data with missing values, urge models to be temporally-continuous. Motivated by the two issues, we propose a spatio-temporal framework based on neural operators for PDEs, which learn the underlying mechanisms governing the dynamics of spatially-continuous physical quantities. Experiments show our model's improved performance on forecasting spatially-continuous physic quantities, and its superior generalization to unseen spatial points and ability to handle temporally-irregular data.  ( 2 min )
    MRAM-based Analog Sigmoid Function for In-memory Computing. (arXiv:2204.09918v1 [cs.ET])
    We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed analog neuron circuit consumes 1.8-27x less power, and occupies 2.5-4931x smaller area, compared to the state-of-the-art analog and digital implementations. Moreover, the developed neuron can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. The architecture-level analyses show that a fully-analog in-memory computing (IMC) circuit that use our SOT-MRAM neuron along with an SOT-MRAM based crossbar can achieve more than 1.1x, 12x, and 13.3x reduction in power, latency, and energy, respectively, compared to a mixed-signal implementation with analog memristive crossbars and digital neurons. Finally, through cross-layer analyses, we provide a guide on how varying the device-level parameters in our neuron can affect the accuracy of multilayer perceptron (MLP) for MNIST classification.  ( 2 min )
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.  ( 2 min )
    NetSentry: A Deep Learning Approach to Detecting Incipient Large-scale Network Attacks. (arXiv:2202.09873v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques are increasingly adopted to tackle ever-evolving high-profile network attacks, including DDoS, botnet, and ransomware, due to their unique ability to extract complex patterns hidden in data streams. These approaches are however routinely validated with data collected in the same environment, and their performance degrades when deployed in different network topologies and/or applied on previously unseen traffic, as we uncover. This suggests malicious/benign behaviors are largely learned superficially and ML-based Network Intrusion Detection System (NIDS) need revisiting, to be effective in practice. In this paper we dive into the mechanics of large-scale network attacks, with a view to understanding how to use ML for Network Intrusion Detection (NID) in a principled way. We reveal that, although cyberattacks vary significantly in terms of payloads, vectors and targets, their early stages, which are critical to successful attack outcomes, share many similarities and exhibit important temporal correlations. Therefore, we treat NID as a time-sensitive task and propose NetSentry, perhaps the first of its kind NIDS that builds on Bidirectional Asymmetric LSTM (Bi-ALSTM), an original ensemble of sequential neural models, to detect network threats before they spread. We cross-evaluate NetSentry using two practical datasets, training on one and testing on the other, and demonstrate F1 score gains above 33% over the state-of-the-art, as well as up to 3 times higher rates of detecting attacks such as XSS and web bruteforce. Further, we put forward a novel data augmentation technique that boosts the generalization abilities of a broad range of supervised deep learning algorithms, leading to average F1 score gains above 35%.  ( 2 min )
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.  ( 2 min )
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.  ( 2 min )
    CNLL: A Semi-supervised Approach For Continual Noisy Label Learning. (arXiv:2204.09881v1 [cs.CV])
    The task of continual learning requires careful design of algorithms that can tackle catastrophic forgetting. However, the noisy label, which is inevitable in a real-world scenario, seems to exacerbate the situation. While very few studies have addressed the issue of continual learning under noisy labels, long training time and complicated training schemes limit their applications in most cases. In contrast, we propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate. After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples. Training in this fashion helps us learn a better representation that results in state-of-the-art (SOTA) performance. Through extensive experimentation on 3 benchmark datasets, MNIST, CIFAR10 and CIFAR100, we show the effectiveness of our proposed approach. We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods. Our code is publicly available.  ( 2 min )
    Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector. (arXiv:2104.08044v11 [cs.CR] UPDATED)
    Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.  ( 3 min )
    Neural Topic Modeling of Psychotherapy Sessions. (arXiv:2204.10189v1 [cs.CL])
    In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the psychotherapy effectiveness.  ( 2 min )
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.  ( 2 min )
    Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports. (arXiv:2009.05094v5 [cs.LG] UPDATED)
    Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification.  ( 3 min )
    Learnable Model Augmentation Self-Supervised Learning for Sequential Recommendation. (arXiv:2204.10128v1 [cs.IR])
    Sequential Recommendation aims to predict the next item based on user behaviour. Recently, Self-Supervised Learning (SSL) has been proposed to improve recommendation performance. However, most of existing SSL methods use a uniform data augmentation scheme, which loses the sequence correlation of an original sequence. To this end, in this paper, we propose a Learnable Model Augmentation self-supervised learning for sequential Recommendation (LMA4Rec). Specifically, LMA4Rec first takes model augmentation as a supplementary method for data augmentation to generate views. Then, LMA4Rec uses learnable Bernoulli dropout to implement model augmentation learnable operations. Next, self-supervised learning is used between the contrastive views to extract self-supervised signals from an original sequence. Finally, experiments on three public datasets show that the LMA4Rec method effectively improves sequential recommendation performance compared with baseline methods.  ( 2 min )
    TorchSparse: Efficient Point Cloud Inference Engine. (arXiv:2204.10319v1 [cs.LG])
    Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.  ( 2 min )
    Learning Future Object Prediction with a Spatiotemporal Detection Transformer. (arXiv:2204.10321v1 [cs.CV])
    We explore future object prediction -- a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware. (arXiv:2204.10183v1 [cs.LG])
    The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.  ( 2 min )
    A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms. (arXiv:2204.10233v1 [cs.LG])
    Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite of algorithmic "fairness-enhancing" remedies. Most existing algorithms, however, are agnostic to the sources of the observed unfairness. As a result, the literature currently lacks guiding frameworks to specify conditions under which each algorithmic intervention can potentially alleviate the underpinning cause of unfairness. To close this gap, we scrutinize the underlying biases (e.g., in the training data or design choices) that cause observational unfairness. We present a bias-injection sandbox tool to investigate fairness consequences of various biases and assess the effectiveness of algorithmic remedies in the presence of specific types of bias. We call this process the bias(stress)-testing of algorithmic interventions. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention in the biased setting with true labels in the unbiased regime -- that is, before any bias injection. We illustrate the utility of our toolkit via a proof-of-concept case study on synthetic data. Our empirical analysis showcases the type of insights that can be obtained through our simulations.  ( 2 min )
    Feature anomaly detection system (FADS) for intelligent manufacturing. (arXiv:2204.10318v1 [cs.CV])
    Anomaly detection is important for industrial automation and part quality assurance, and while humans can easily detect anomalies in components given a few examples, designing a generic automated system that can perform at human or above human capabilities remains a challenge. In this work, we present a simple new anomaly detection algorithm called FADS (feature-based anomaly detection system) which leverages pretrained convolutional neural networks (CNN) to generate a statistical model of nominal inputs by observing the activation of the convolutional filters. During inference the system compares the convolutional filter activation of the new input to the statistical model and flags activations that are outside the expected range of values and therefore likely an anomaly. By using a pretrained network, FADS demonstrates excellent performance similar to or better than other machine learning approaches to anomaly detection while at the same time FADS requires no tuning of the CNN weights. We demonstrate FADS ability by detecting process parameter changes on a custom dataset of additively manufactured lattices. The FADS localization algorithm shows that textural differences that are visible on the surface can be used to detect process parameter changes. In addition, we test FADS on benchmark datasets, such as the MVTec Anomaly Detection dataset, and report good results.  ( 2 min )
    A two-level machine learning framework for predictive maintenance: comparison of learning formulations. (arXiv:2204.10083v1 [cs.LG])
    Predicting incoming failures and scheduling maintenance based on sensors information in industrial machines is increasingly important to avoid downtime and machine failure. Different machine learning formulations can be used to solve the predictive maintenance problem. However, many of the approaches studied in the literature are not directly applicable to real-life scenarios. Indeed, many of those approaches usually either rely on labelled machine malfunctions in the case of classification and fault detection, or rely on finding a monotonic health indicator on which a prediction can be made in the case of regression and remaining useful life estimation, which is not always feasible. Moreover, the decision-making part of the problem is not always studied in conjunction with the prediction phase. This paper aims to design and compare different formulations for predictive maintenance in a two-level framework and design metrics that quantify both the failure detection performance as well as the timing of the maintenance decision. The first level is responsible for building a health indicator by aggregating features using a learning algorithm. The second level consists of a decision-making system that can trigger an alarm based on this health indicator. Three degrees of refinements are compared in the first level of the framework, from simple threshold-based univariate predictive technique to supervised learning methods based on the remaining time before failure. We choose to use the Support Vector Machine (SVM) and its variations as the common algorithm used in all the formulations. We apply and compare the different strategies on a real-world rotating machine case study and observe that while a simple model can already perform well, more sophisticated refinements enhance the predictions for well-chosen parameters.  ( 2 min )
    Learning spatiotemporal features from incomplete data for traffic flow prediction using hybrid deep neural networks. (arXiv:2204.10222v1 [cs.LG])
    Urban traffic flow prediction using data-driven models can play an important role in route planning and preventing congestion on highways. These methods utilize data collected from traffic recording stations at different timestamps to predict the future status of traffic. Hence, data collection, transmission, storage, and extraction techniques can have a significant impact on the performance of the traffic flow model. On the other hand, a comprehensive database can provide the opportunity for using complex, yet reliable predictive models such as deep learning methods. However, most of these methods have difficulties in handling missing values and outliers. This study focuses on hybrid deep neural networks to predict traffic flow in the California Freeway Performance Measurement System (PeMS) with missing values. The proposed networks are based on a combination of recurrent neural networks (RNNs) to consider the temporal dependencies in the data recorded in each station and convolutional neural networks (CNNs) to take the spatial correlations in the adjacent stations into account. Various architecture configurations with series and parallel connections are considered based on RNNs and CNNs, and several prevalent data imputation techniques are used to examine the robustness of the hybrid networks to missing values. A comprehensive analysis performed on two different datasets from PeMS indicates that the proposed series-parallel hybrid network with the mean imputation technique achieves the lowest error in predicting the traffic flow and is robust to missing values up until 21% missing ratio in both complete and incomplete training data scenarios when applied to an incomplete test data.  ( 2 min )
    Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries. (arXiv:2204.10162v1 [cs.LG])
    Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Revisiting Gaussian mixture critic in off-policy reinforcement learning: a sample-based approach. (arXiv:2204.10256v1 [cs.LG])
    Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value estimation.One major drawback of the C51 approach is its requirement of prior knowledge about the minimum andmaximum values a policy can attain as well as the number of bins used, which fixes the resolution ofthe distributional estimate. While the DeepMind control suite of tasks utilizes standardized rewards and episode lengths, thus enabling the entire suite to be solved with a single setting of these hyperparameters, this is often not the case. This paper revisits a natural alternative that removes this requirement, namelya mixture of Gaussians, and a simple sample-based loss function to train it in an off-policy regime. We empirically evaluate its performance on a broad range of continuous control tasks and demonstrate that it eliminates the need for these distributional hyperparameters and achieves state-of-the-art performance on a variety of challenging tasks (e.g. the humanoid, dog, quadruped, and manipulator domains). Finallywe provide an implementation in the Acme agent repository.  ( 2 min )
    IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages. (arXiv:2204.10195v1 [cs.CL])
    In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.  ( 2 min )
    Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge. (arXiv:2204.10202v1 [cs.CL])
    Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.  ( 2 min )
    Is Neuron Coverage Needed to Make Person Detection More Robust?. (arXiv:2204.10027v1 [cs.CV])
    The growing use of deep neural networks (DNNs) in safety- and security-critical areas like autonomous driving raises the need for their systematic testing. Coverage-guided testing (CGT) is an approach that applies mutation or fuzzing according to a predefined coverage metric to find inputs that cause misbehavior. With the introduction of a neuron coverage metric, CGT has also recently been applied to DNNs. In this work, we apply CGT to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding DNN bugs via sampling and mutation, and subsequent DNN retraining on the updated training set. To be a bug, we require a mutated image to cause a significant performance drop compared to a clean input. In accordance with the CGT, we also consider an additional requirement of increased coverage in the bug definition. In order to explore several types of robustness, our approach includes natural image transformations, corruptions, and adversarial examples generated with the Daedalus attack. The proposed framework has uncovered several thousand cases of incorrect DNN behavior. The relative change in mAP performance of the retrained models reached on average between 26.21\% and 64.24\% for different robustness types. However, we have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.  ( 2 min )
    Evolution and use of data science vocabulary. How much have we changed in 13 years?. (arXiv:2204.10174v1 [cs.DL])
    Here I present an investigation on the evolution and use of vocabulary in data science in the last 13 years. Based on a rigorous statistical analysis, a database with 12,787 documents containing the words "data science" in the title, abstract or keywords is analyzed. It is proposed to classify the evolution of this discipline in three periods: emergence, growth and boom. Characteristic words and pioneering documents are identified for each period. By proposing the distinctive vocabulary and relevant topics of data science and classified in time periods, these results add value to the scientific community of this discipline.  ( 2 min )
    Detecting Topology Attacks against Graph Neural Networks. (arXiv:2204.10072v1 [cs.LG])
    Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.  ( 2 min )
    Working memory inspired hierarchical video decomposition with transformative representations. (arXiv:2204.10105v1 [cs.CV])
    Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises still exist in video decomposition. To solve these problems, this study is the first to introduce a flexible visual working memory model in video decomposition tasks to provide interpretable and high-performance hierarchical deep architecture, integrating the transformative representations between sensory and control layers from the perspective of visual and cognitive neuroscience. Specifically, robust PCA unrolling networks acting as a structure-regularized sensor layer decompose XCA into sparse/low-rank structured representations to separate moving contrast-filled vessels from noisy and complex backgrounds. Then, patch recurrent convolutional LSTM networks with a backprojection module embody unstructured random representations of the control layer in working memory, recurrently projecting spatiotemporally decomposed nonlocal patches into orthogonal subspaces for heterogeneous vessel retrieval and interference suppression. This video decomposition deep architecture effectively restores the heterogeneous profiles of intensity and the geometries of moving objects against the complex background interferences. Experiments show that the proposed method significantly outperforms state-of-the-art methods in accurate moving contrast-filled vessel extraction with excellent flexibility and computational efficiency.  ( 2 min )
    Robustness of Machine Learning Models Beyond Adversarial Attacks. (arXiv:2204.10046v1 [cs.LG])
    Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.  ( 2 min )
    Fluctuation-based Outlier Detection. (arXiv:2204.10007v1 [cs.LG])
    Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with seven state-of-the-art algorithms on eight real-world tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection.  ( 2 min )
    Multi-Tier Platform for Cognizing Massive Electroencephalogram. (arXiv:2204.09840v1 [eess.SP])
    An end-to-end platform assembling multiple tiers is built for precisely cognizing brain activities. Being fed massive electroencephalogram (EEG) data, the time-frequency spectrograms are conventionally projected into the episode-wise feature matrices (seen as tier-1). A spiking neural network (SNN) based tier is designed to distill the principle information in terms of spike-streams from the rare features, which maintains the temporal implication in the nature of EEGs. The proposed tier-3 transposes time- and space-domain of spike patterns from the SNN; and feeds the transposed pattern-matrices into an artificial neural network (ANN, Transformer specifically) known as tier-4, where a special spanning topology is proposed to match the two-dimensional input form. In this manner, cognition such as classification is conducted with high accuracy. For proof-of-concept, the sleep stage scoring problem is demonstrated by introducing multiple EEG datasets with the largest comprising 42,560 hours recorded from 5,793 subjects. From experiment results, our platform achieves the general cognition overall accuracy of 87% by leveraging sole EEG, which is 2% superior to the state-of-the-art. Moreover, our developed multi-tier methodology offers visible and graphical interpretations of the temporal characteristics of EEG by identifying the critical episodes, which is demanded in neurodynamics but hardly appears in conventional cognition scenarios.  ( 2 min )
    A data filling methodology for time series based on CNN and (Bi)LSTM neural networks. (arXiv:2204.09994v1 [cs.LG])
    In the process of collecting data from sensors, several circumstances can affect their continuity and validity, resulting in alterations of the data or loss of information. Although classical methods of statistics, such as interpolation-like techniques, can be used to approximate the missing data in a time series, the recent developments in Deep Learning (DL) have given impetus to innovative and much more accurate forecasting techniques. In the present paper, we develop two DL models aimed at filling data gaps, for the specific case of internal temperature time series obtained from monitored apartments located in Bolzano, Italy. The DL models developed in the present work are based on the combination of Convolutional Neural Networks (CNNs), Long Short-Term Memory Neural Networks (LSTMs), and Bidirectional LSTMs (BiLSTMs). Two key features of our models are the use of both pre- and post-gap data, and the exploitation of a correlated time series (the external temperature) in order to predict the target one (the internal temperature). Our approach manages to capture the fluctuating nature of the data and shows good accuracy in reconstructing the target time series. In addition, our models significantly improve the already good results from another DL architecture that is used as a baseline for the present work.  ( 2 min )
    A Learned Index for Exact Similarity Search in Metric Spaces. (arXiv:2204.10028v1 [cs.DB])
    Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index has been explored actively to replace or supplement traditional index structures with machine learning models to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, a novel indexing approach called LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes for efficient similarity query processing in metric spaces. The underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on the disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.  ( 2 min )
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.  ( 2 min )
    Deep transfer learning for partial differential equations under conditional shift with DeepONet. (arXiv:2204.09810v1 [cs.LG])
    Traditional machine learning algorithms are designed to learn in isolation, i.e. address single tasks. The core idea of transfer learning (TL) is that knowledge gained in learning to perform one task (source) can be leveraged to improve learning performance in a related, but different, task (target). TL leverages and transfers previously acquired knowledge to address the expense of data acquisition and labeling, potential computational power limitations, and the dataset distribution mismatches. Although significant progress has been made in the fields of image processing, speech recognition, and natural language processing (for classification and regression) for TL, little work has been done in the field of scientific machine learning for functional regression and uncertainty quantification in partial differential equations. In this work, we propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet). Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain by embedding conditional distributions onto a reproducing kernel Hilbert space. Task-specific operator learning is accomplished by fine-tuning task-specific layers of the target DeepONet using a hybrid loss function that allows for the matching of individual target samples while also preserving the global properties of the conditional distribution of target data. We demonstrate the advantages of our approach for various TL scenarios involving nonlinear PDEs under conditional shift. Our results include geometry domain adaptation and show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.  ( 2 min )
    Fairness in Graph Mining: A Survey. (arXiv:2204.09888v1 [cs.LG])
    Graph mining algorithms have been playing a significant role in myriad fields over the years. However, despite their promising performance on various graph analytical tasks, most of these algorithms lack fairness considerations. As a consequence, they could lead to discrimination towards certain populations when exploited in human-centered applications. Recently, algorithmic fairness has been extensively studied in graph-based applications. In contrast to algorithmic fairness on independent and identically distributed (i.i.d.) data, fairness in graph mining has exclusive backgrounds, taxonomies, and fulfilling techniques. In this survey, we provide a comprehensive and up-to-date introduction of existing literature under the context of fair graph mining. Specifically, we propose a novel taxonomy of fairness notions on graphs, which sheds light on their connections and differences. We further present an organized summary of existing techniques that promote fairness in graph mining. Finally, we summarize the widely used datasets in this emerging research field and provide insights on current research challenges and open questions, aiming at encouraging cross-breeding ideas and further advances.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )
    Perception Visualization: Seeing Through the Eyes of a DNN. (arXiv:2204.09920v1 [cs.CV])
    Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by developing various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.  ( 2 min )
    GUARD: Graph Universal Adversarial Defense. (arXiv:2204.09803v1 [cs.LG])
    Recently, graph convolutional networks (GCNs) have shown to be vulnerable to small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such a threat, considerable research efforts have been devoted to increasing the robustness of GCNs against adversarial attacks. However, current approaches for defense are typically designed for the whole graph and consider the global performance, posing challenges in protecting important local nodes from stronger adversarial targeted attacks. In this work, we present a simple yet effective method, named \textbf{\underline{G}}raph \textbf{\underline{U}}niversal \textbf{\underline{A}}dve\textbf{\underline{R}}sarial \textbf{\underline{D}}efense (GUARD). Unlike previous works, GUARD protects each individual node from attacks with a universal defensive patch, which is generated once and can be applied to any node (node-agnostic) in a graph. Extensive experiments on four benchmark datasets demonstrate that our method significantly improves robustness for several established GCNs against multiple adversarial attacks and outperforms existing adversarial defense methods by large margins. Our code is publicly available at https://github.com/EdisonLeeeee/GUARD.  ( 2 min )
    MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering. (arXiv:2204.10011v1 [cs.LG])
    In healthcare prediction tasks, it is essential to exploit the correlations between medical features and learn better patient health representations. Existing methods try to estimate feature correlations only from data, or increase the quality of estimation by introducing task-specific medical knowledge. However, such methods either are difficult to estimate the feature correlations due to insufficient training samples, or cannot be generalized to other tasks due to reliance on specific knowledge. There are medical research revealing that not all the medical features are strongly correlated. Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality. In this paper, we propose a general patient health representation learning framework MedFACT. We estimate correlations via measuring similarity between temporal patterns of medical features with kernel methods, and cluster features with strong correlations into groups. The feature group is further formulated as a correlation graph, and we employ graph convolutional networks to conduct group-wise feature interactions for better representation learning. Experiments on two real-world datasets demonstrate the superiority of MedFACT. The discovered medical findings are also confirmed by literature, providing valuable medical insights and explanations.  ( 2 min )
    fairDMS: Rapid Model Training by Data and Model Reuse. (arXiv:2204.09805v1 [cs.LG])
    Extracting actionable information from data sources such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming more challenging due to the fast-growing data generation rate. The rapid analysis possible with ML methods can enable fast feedback loops that can be used to adjust experimental setups in real-time, for example when errors occur or interesting events are detected. However, to avoid degradation in ML performance over time due to changes in an instrument or sample, we need a way to update ML models rapidly while an experiment is running. We present here a data service and model service to accelerate deep neural network training with a focus on ML-based scientific applications. Our proposed data service achieves 100x speedup in terms of data labeling compare to the current state-of-the-art. Further, our model service achieves up to 200x improvement in training speed. Overall, fairDMS achieves up to 92x speedup in terms of end-to-end model updating time.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.  ( 2 min )
    Scaling Language Model Size in Cross-Device Federated Learning. (arXiv:2204.09715v1 [cs.CL])
    Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.  ( 2 min )
    A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines. (arXiv:2204.09772v1 [cs.AI])
    A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.  ( 2 min )
    Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach. (arXiv:2204.09746v1 [cs.LG])
    The limited communication resources, e.g., bandwidth and energy, and data heterogeneity across devices are two of the main bottlenecks for federated learning (FL). To tackle these challenges, we first devise a novel FL framework with partial model aggregation (PMA), which only aggregates the lower layers of neural networks responsible for feature extraction while the upper layers corresponding to complex pattern recognition remain at devices for personalization. The proposed PMA-FL is able to address the data heterogeneity and reduce the transmitted information in wireless channels. We then obtain a convergence bound of the framework under a non-convex loss function setting. With the aid of this bound, we define a new objective function, named the scheduled data sample volume, to transfer the original inexplicit optimization problem into a tractable one for device scheduling, bandwidth allocation, computation and communication time division. Our analysis reveals that the optimal time division is achieved when the communication and computation parts of PMA-FL have the same power. We also develop a bisection method to solve the optimal bandwidth allocation policy and use the set expansion algorithm to address the optimal device scheduling. Compared with the state-of-the-art benchmarks, the proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets, i.e., MINIST and CIFAR-10, respectively. In addition, the proposed joint dynamic device scheduling and resource optimization approach achieve slightly higher accuracy than the considered benchmarks, but they provide a satisfactory energy and time reduction: 29% energy or 20% time reduction on the MNIST; and 25% energy or 12.5% time reduction on the CIFAR-10.  ( 2 min )
    Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation. (arXiv:2204.09801v1 [cs.LG])
    In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.  ( 2 min )
    Matching Writers to Content Writing Tasks. (arXiv:2204.09718v1 [cs.CL])
    Businesses need content. In various forms and formats and for varied purposes. In fact, the content marketing industry is set to be worth $412.88 billion by the end of 2021. However, according to the Content Marketing Institute, creating engaging content is the #1 challenge that marketers face today. We under-stand that producing great content requires great writers who understand the business and can weave their message into reader (and search engine) friendly content. In this project, the team has attempted to bridge the gap between writers and projects by using AI and ML tools. We used NLP techniques to analyze thou-sands of publicly available business articles (corpora) to extract various defining factors for each writing sample. Through this project we aim to automate the highly time-consuming, and often biased task of manually shortlisting the most suitable writer for a given content writing requirement. We believe that a tool like this will have far reaching positive implications for both parties - businesses looking for suitable talent for niche writing jobs as well as experienced writers and Subject Matter Experts (SMEs) wanting to lend their services to content marketing projects. The business gets the content they need, the content writer/ SME gets a chance to leverage his or her talent, while the reader gets authentic content that adds real value.  ( 2 min )
    Generative Pre-Trained Transformers for Biologically Inspired Design. (arXiv:2204.09714v1 [cs.CL])
    Biological systems in nature have evolved for millions of years to adapt and survive the environment. Many features they developed can be inspirational and beneficial for solving technical problems in modern industries. This leads to a novel form of design-by-analogy called bio-inspired design (BID). Although BID as a design method has been proven beneficial, the gap between biology and engineering continuously hinders designers from effectively applying the method. Therefore, we explore the recent advance of artificial intelligence (AI) for a computational approach to bridge the gap. This paper proposes a generative design approach based on the pre-trained language model (PLM) to automatically retrieve and map biological analogy and generate BID in the form of natural language. The latest generative pre-trained transformer, namely GPT-3, is used as the base PLM. Three types of design concept generators are identified and fine-tuned from the PLM according to the looseness of the problem space representation. Machine evaluators are also fine-tuned to assess the correlation between the domains within the generated BID concepts. The approach is then tested via a case study in which the fine-tuned models are applied to generate and evaluate light-weighted flying car concepts inspired by nature. The results show our approach can generate BID concepts with good performance.  ( 2 min )
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.  ( 2 min )
    FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow. (arXiv:2204.09679v1 [cs.CV])
    Super-resolution suffers from an innate ill-posed problem that a single low-resolution (LR) image can be from multiple high-resolution (HR) images. Recent studies on the flow-based algorithm solve this ill-posedness by learning the super-resolution space and predicting diverse HR outputs. Unfortunately, the diversity of the super-resolution outputs is still unsatisfactory, and the outputs from the flow-based model usually suffer from undesired artifacts which causes low-quality outputs. In this paper, we propose FS-NCSR which produces diverse and high-quality super-resolution outputs using frequency separation and noise conditioning compared to the existing flow-based approaches. As the sharpness and high-quality detail of the image rely on its high-frequency information, FS-NCSR only estimates the high-frequency information of the high-resolution outputs without redundant low-frequency components. Through this, FS-NCSR significantly improves the diversity score without significant image quality degradation compared to the NCSR, the winner of the previous NTIRE 2021 challenge.  ( 2 min )
  • Open

    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.  ( 2 min )
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.  ( 2 min )
    Strong posterior contraction rates via Wasserstein dynamics. (arXiv:2203.10754v2 [math.ST] UPDATED)
    In this paper, we develop a novel approach to posterior contractions rates (PCRs), for both finite-dimensional (parametric) and infinite-dimensional (nonparametric) Bayesian models. Critical to our approach is the combination of an assumption of local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, here referred to as Wasserstein dynamics, which allows to set forth a connection between the problem of establishing PCRs and some classical problems in mathematical analysis, probability theory and mathematical statistics: the Laplace method for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of the mean Glivenko-Cantelli theorem, and estimates of weighted Poincar\'e-Wirtinger constants. Under dominated Bayesian models, we present two main results: i) a theorem on PCRs for the regular infinite-dimensional exponential family of statistical models; ii) a theorem on PCRs for a general dominated statistical model. Some applications of our results are presented for the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our results lead to optimal PCRs in finite dimension, whereas in infinite dimension it is shown how the prior distribution may affect PCRs. With regards to infinite-dimensional Bayesian models for density estimation, our approach to PCRs is the first to consider strong norm distances on parameter spaces of functions, such as Sobolev-like norms, as most of the approaches in the classical (frequentist) and Bayesian literature deal with spaces of density functions endowed with $\mathrm{L}^p$ norms or the Hellinger distance.  ( 2 min )
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.
    Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation. (arXiv:2203.00953v3 [math.ST] UPDATED)
    Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.
    Beyond the density operator and Tr(\rho A): Exploiting the higher-order statistics of random-coefficient pure states for quantum information processing. (arXiv:2204.10031v1 [quant-ph])
    Two types of states are widely used in quantum mechanics, namely (deterministic-coefficient) pure states and statistical mixtures. A density operator can be associated with each of them. We here address a third type of states, that we previously introduced in a more restricted framework. These states generalize pure ones by replacing each of their deterministic ket coefficients by a random variable. We therefore call them Random-Coefficient Pure States, or RCPS. We analyze their properties and their relationships with both types of usual states. We show that RCPS contain much richer information than the density operator and mean of observables that we associate with them. This occurs because the latter operator only exploits the second-order statistics of the random state coefficients, whereas their higher-order statistics contain additional information. That information can be accessed in practice with the multiple-preparation procedure that we propose for RCPS, by using second-order and higher-order statistics of associated random probabilities of measurement outcomes. Exploiting these higher-order statistics opens the way to a very general approach for performing advanced quantum information processing tasks. We illustrate the relevance of this approach with a generic example, dealing with the estimation of parameters of a quantum process and thus related to quantum process tomography. This parameter estimation is performed in the non-blind (i.e. supervised) or blind (i.e. unsupervised) mode. We show that this problem cannot be solved by using only the density operator \rho of an RCPS and the associated mean value Tr(\rho A) of the operator A that corresponds to the considered physical quantity. We succeed in solving this problem by exploiting a fourth-order statistical parameter of state coefficients, in addition to second-order statistics. Numerical tests validate this result.
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.
    Murmurations of elliptic curves. (arXiv:2204.10140v1 [math.NT])
    We investigate the average value of the $p$th Dirichlet coefficients of elliptic curves for a prime p in a fixed conductor range with given rank. Plotting this average yields a striking oscillating pattern, the details of which vary with the rank. Based on this observation, we perform various data-scientific experiments with the goal of classifying elliptic curves according to their ranks.
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.  ( 2 min )
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Path-Specific Objectives for Safer Agent Incentives. (arXiv:2204.10018v1 [cs.AI])
    We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )

  • Open

    I have a time series of time, temp, humidity, apparent temp, and ac/heater/fan state
    I want to create a NN that takes these readings and makes predictions about "what will the readings be in 5 minutes if I turn the AC on?". I'm thinking of training it with "the angle of the sun at that time", "temp", "humidity", and "ac/heater/fan state"(3) and then extracting data pairs spaced by 5 minutes where the system was in that state for the entire interval. Then I'm thinking I should use the 5-minute-later apparent temp as the training output. So the NN ultimately answers the question "what would be the apparent temperature if the system were to be in the given state for the next 5 minutes?" Am I on the right track here? submitted by /u/HasFiveVowels [link] [comments]  ( 1 min )
  • Open

    An AI painting some colorful pitbulls
    submitted by /u/p0goniphaft111 [link] [comments]
    Building A Pictionary App (sketch recognition model) with Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Is there a AI which I can use to edit images(selfies etc.)?
    Like I mark some areas of my pictures and then selcect what should happend with them? submitted by /u/xXLisa28Xx [link] [comments]
    What AI can I use to make caricatures from pictures from people?
    Is artbreeder the best way to do it, or is there a better way? submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    Learning or working with AI? Come join us, we are a Discord Community with over 20'000 members! Ask questions, find teammates, share your projects, attend events, and much more to come!
    Programming is way more fun when you learn/work with someone. Help each other, ask questions, brainstorm, etc. There is just so much benefit to joining a community when you are in this field, especially when you cannot find the question you are looking for on stack overflow! 😉 This is the same thing with AI, and it is why a little less than two years ago I created a discord server. Where anyone learning or working in the field could come and share their projects, learn together, work together, and much more. The community has now over 20 000 members, which is unbelievable! So glad to see it growing and see everyone so active. We also have an amazing partnership with an AI company coming that is super exciting for the community. You definitely want to be there to enjoy all the benefits they will give us. Come join us if you are in the field of AI ! https://discord.gg/learnaitogether submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Analyse sentiment/tonality in social networks
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 3 min )
  • Open

    [R] GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints
    https://arxiv.org/abs/2204.09123 https://www.researchgate.net/publication/360079336_GAMe_changer_or_not_An_evaluation_of_interpretable_machine_learning_models_based_on_additive_model_constraints submitted by /u/Positive_Ad_1090 [link] [comments]
    How can you differentiate Kornia SIFT descriptor? [P]
    Kornia is a differentiable library for computer vision based on PyTorch. Does anyone have experience with their SIFT descriptor. What can you differentiate? submitted by /u/avd4292 [link] [comments]
    [D] Evaluation and Selecting Models: Base on Loss or Metrics?
    When comes to evaluating and selecting a model, should one focus on minimizing loss (i.e., sparse categorical crossentropy) or obtain high rated metrics (i.e., f1)? Often time the model highest rated metrics would generate higher loss than ones with lower ratings in metrics during validation/test sets. Some would say focus on metrics as loss are for the machine to optimize learning, what stays in the training, stays in the training. However, wouldn't loss be also an important element to consider since it also describe the performance of the model, particularly when obtained from the test set? How should one prioritize? Metrics/loss rules all or seek for balance? submitted by /u/Hydraze [link] [comments]  ( 1 min )
    [R] Optimize clustering for downstream task
    Assume to have a 2-step algorithm: 1) aggregate data points into clusters 2) feed the clusters to a downstream task (e.g. classification, regression, etc). Is there any work that explores how to optimize the clustering in 1) to achieve the best performance in the downstream task 2)? One example would be a differentiable clustering algorithm that receives gradients from the downstream task or a parametrized clustering algorithm whose parameters are automatically tuned to increase the performance of the downstream task. I have found very little on this topic in the literature, could you point me to some relevant work? submitted by /u/fedetask [link] [comments]  ( 1 min )
    [D] What is a good emoji aware pre-trained language model?
    I am classifying social media posts (facebook, instagram), with emojis being upwards of 100% of content. For example, you may want to tag "🤮🤮🤮" as in need for moderation, and "🤔🤔🤔" as prioritized for a response. Looking for a good model to fine tune I found BerTweet, which seems at least somewhat emoji aware. However it also has a ton of out-of-vocabulary results, both for emoji and semi-common English words, despite it's liberal use of emoji.demojize and splitting up more complex emoji: ​ https://preview.redd.it/t6ai3o8le1v81.png?width=687&format=png&auto=webp&s=c16157addbe1b3d34858708f3e6c7517e64d26ec A model like `xlm-roberta-base with a larger vocabulary (250k) and more robust tokenization seems to have some 500 emoji directly in its vocabulary directly, without converting them to text. This seems potentially more promising, but also guarantees a token like 🤮 is just out of vocabulary rather than being interpreted by word pieces. Has anyone here had experience with dealing with emoji in text classification, and what approaches were most successful? submitted by /u/sanderbaduk [link] [comments]  ( 2 min )
    [D] What is the best method to use metric network at finetune after contrastive learning?
    Hi, I have a question about how to use metric network after contrastive learning. If I have trained a network well with NCELoss, I would like to finetune this network to match the best output by input(It used at calculating NCELoss). Is there any good way to do it? ​ Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] Opinions needed - Anyone interested in mock peer review?
    We’d like to know if anyone is interested in participating in a mock peer review? Basically if you have a paper you’d like to get feedback on, and would like to review others’ papers in exchange, you’re welcome to continue reading. We are gauging public interest in mock peer review and exploring the possibility to host the reviews on DouBlind. We’d like to know your answers to the following questions: Are you interested in mock peer review? Do you want to do this privately (paper and review are kept inside a small group) or openly (paper and review are open)? How many papers do you like to review? Do you have any concerns? submitted by /u/DouBlindDotCOM [link] [comments]  ( 2 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    This is reproduced from Zhihu and translated by DeepL, only used for enthusiasts to communicate. ​ MindSpore, as an end-to-edge cloud collaborative full-scenario AI open source framework, takes into account the flexibility of academic research and the high-performance needs of industry, supports end-to-edge cloud full-scene business, and brings developers a simpler programming, easier debugging, superior performance, and more flexible deployment experience, which has received widespread attention and application in the industry and has been open source on 2020.3.28, and is the Gitee The highest index of open source software. Welcome to participate in open source contributions, model crowdsourcing collaboration, industry innovation and application, algorithm innovation, academic collabora…  ( 3 min )
  • Open

    Data Pre-Processing in TF-Agents
    Hi everyone, this is my first post please go easy on me. I'm currently playing around with a bigger Model in tf-agents. I worked only with structured data (TF, SKlearn, Pandas...). Now I'm struggling a bit with the preprocessing and where in the architecture to place it. I use multiple Inputs and encoding layers for each of them. For the training of the Encoders I used some SKLearn pre-processor (StandardScaler, MinMaxScaler, KBinsDiscretizer). I try to reuse the pre-processing pipeline in the model or extract the information for other pre-processing mechanisms(e.g. pre-processing tf layers) My current options I came up with: Incorporate it directly into the Environment and return the pre-processed observation Pro easy, can probably use my SKlearn pipeline Contra I'd like to keep the architecture clean, so the environment should only give out raw values and not prepared values for a certain model Use an environment wrapper around the "raw" environment Pro "raw" environment needs no tuning Contra not sure if I can use my pipeline here and not sure if I'm taking a bad path here Use pre-processing TF layers Pro most of the API is there and can be used in my Encodernetworks, seems to be the TFic way Contra the SKlearn pre-processor have values for each column. The layers (e.g. rescale ) seems to only take one configuration for a whole tensor. I could probably create a layer for each value in the Tensor but that doesn't feel that it is supposed to be like that. If you have more options or can share your experience one of the options mentioned above I would be very glad. submitted by /u/Kjiessar [link] [comments]  ( 1 min )
    Masking in RNN in the actor network
    I am using PPO in the context of multi-agent RL. I was wondering if PyTorch has a way of handling when hidden states should be reinitialized to zeros. What I have found is this implementation: def forward(self, x, hxs, masks): if x.size(0) == hxs.size(0): x, hxs = self.rnn(x.unsqueeze(0), (hxs * masks.repeat(1, self._recurrent_N).unsqueeze(-1)).transpose(0, 1).contiguous()) x = x.squeeze(0) hxs = hxs.transpose(0, 1) else: # x is a (T, N, -1) tensor that has been flatten to (T * N, -1) N = hxs.size(0) T = int(x.size(0) / N) # unflatten x = x.view(T, N, x.size(1)) # Same deal with masks masks = masks.view(T, N) # Let's figure out which steps in the sequence have a zero for any agent # We will always assume t=0 has a zero in it as that makes the logic cleaner has_zeros = ((masks[1:] == 0.0) .any(dim=-1) .nonzero() .squeeze() .cpu()) # +1 to correct the masks[1:] if has_zeros.dim() == 0: # Deal with scalar has_zeros = [has_zeros.item() + 1] else: has_zeros = (has_zeros + 1).numpy().tolist() # add t=0 and t=T to the list has_zeros = [0] + has_zeros + [T] hxs = hxs.transpose(0, 1) outputs = [] for i in range(len(has_zeros) - 1): # We can now process steps that don't have any zeros in masks together! # This is much faster start_idx = has_zeros[i] end_idx = has_zeros[i + 1] temp = (hxs * masks[start_idx].view(1, -1, 1).repeat(self._recurrent_N, 1, 1)).contiguous() rnn_scores, hxs = self.rnn(x[start_idx:end_idx], temp) outputs.append(rnn_scores) # assert len(outputs) == T # x is a (T, N, -1) tensor x = torch.cat(outputs, dim=0) # flatten x = x.reshape(T * N, -1) hxs = hxs.transpose(0, 1) submitted by /u/No_Possibility_7588 [link] [comments]  ( 2 min )
    Does anyone know of a chess environment written in JAX?
    I don't think an opensource one exists but figured I'd ask here because you never know what's laying around the internet! As an aside, if one doesn't exist, let me know if you're interested in partnering in writing one! Edit: For anyone wondering I need the env to be in jax because my muzero implementation is in jax and I need the env to run on TPU cores, not CPU submitted by /u/evanatyourservice [link] [comments]  ( 1 min )
    Papers that use neural networks solely for planning in large MDPS (i.e., no learning)
    I am looking for any papers that do the following: use neural networks in the RL pipeline as the state space is too large for calculating the optimal policy using the traditional tabular value iteration or policy iteration. In this setting, the model is completely known, i.e., no learning. Most papers I see with DeepRL assume that the transition probabilities are unknown and that they have access to a simulator that gives them the ability to query data points. I am looking for existing work in DeepRL where the transition probabilities are known but the problem is intractable using tabular methods. Any direction would be appreciated, thanks! submitted by /u/lolillini [link] [comments]  ( 1 min )
    PPO update without using NNs / batch updates
    Hello, im making a new post as i couldnt find any answers to this before (although this reddit post is similar to my issue) I am trying to implement a simple multivariate Gaussian policy without neural networks, basically using a standard policy gradient update with SGD + score function gradient, without batches. The reason for this is to avoid unstable updates, meaning too large updates in mean/variance. The idea is thus to use a trust region update, to keep the updates within some reasonable size. I am a little confused regarding the maximization of the surrogate objective. As seen in this stackoverflow post, we wish to maximize [pi/pi_old] , compared to [log(pi)] in vanilla PG. Since i do not use automatic differentiation, but one single stochastic descent, how do I find the gradient of pi/pi_old ? To my understanding, the flow of the algorithm is this: sample experience -> compute new policy parameters -> compare with previous policy -> construct surrogate function -> perform SGD on surrogate to get the actual new policy It is the last step i am struggling with. submitted by /u/Acrobatic-Ad-9189 [link] [comments]  ( 1 min )
    Simulating robotic arm for object manipulation
    I'll be starting my work for object manipulation using deep RL, and i would like to get start from the scratch, please recommend the source, tools, and software used for this purpose. but not be working on modeling the robot, instead will be using any robot with gripper which can be interfaced with ROS. Also please link the github repositories which can be helpfull in the learning process Thanks submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    policy-encoding mapping implementation
    Hi, I want to check policy-encoding mapping e : (S → A) → R^k in Universal Successor Features Approximators. I don't know how to embedding network to another network. There are too many weights! Do you have any ideas? Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Useful Tools and Resources for Reinforcement Learning
    Found a useful list of Tools, Frameworks, and Resources for RL/ML. It covers Reinforcement learning, Machine Learning (TensorFlow & PyTorch), Core ML, Deep Learning, Computer Vision (CV). I thought I'd share it for anyone that's interested submitted by /u/Khaotic_Kernel [link] [comments]  ( 1 min )
  • Open

    Pix2Seq: A New Language Interface for Object Detection
    Posted by Ting Chen and David Fleet, Research Scientists, Google Research, Brain Team Object detection is a long-standing computer vision task that attempts to recognize and localize all objects of interest in an image. The complexity arises when trying to identify or localize all object instances while also avoiding duplication. Existing approaches, like Faster R-CNN and DETR, are carefully designed and highly customized in the choice of architecture and loss function. This specialization of existing systems has created two major barriers: (1) it adds complexity in tuning and training the different parts of the system (e.g., region proposal network, graph matching with GIOU loss, etc.), and (2), it can reduce the ability of a model to generalize, necessitating a redesign of the model for…  ( 7 min )
  • Open

    Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances
    AWS CodeArtifact allows developers to connect internal code repositories to upstream code repositories like Pypi, Maven, or NPM. AWS CodeArtifact is a powerful addition to CI/CD workflows on AWS, but it is similarly effective for code-bases hosted on a Jupyter notebook. This is a common development paradigm for Machine Learning developers that build and train […]  ( 9 min )
  • Open

    Web Frameworks for Your Python Projects
    When we finished a Python project and roll it out for other people to use it, the easiest is to present our project as a command line program. If you want to make it friendlier, you may want to develop a GUI for your program so people can interact with the program with mouse clicks […] The post Web Frameworks for Your Python Projects appeared first on Machine Learning Mastery.  ( 36 min )
  • Open

    7 Ways Your Business Can Plan For Artificial Intelligence
    Artificial Intelligence is all over the world today. From the use of virtual assistants like Siri, Alexa, or Cortana, to improving…  ( 2 min )
  • Open

    By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet
    Different parts of the globe are experiencing distinct climate challenges — severe drought, dangerous flooding, reduced biodiversity or dense air pollution. The challenges are so great that no country can solve them on their own. But innovative startups worldwide are lighting the way, demonstrating how these daunting challenges can be better understood and addressed with Read article > The post By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Last Week in AI: Chip Startup Funding Doubled, Google Text+Image Search, Analog AI, Criminal Robotaxi
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 31 - Spaceships Galore Planet VQGAN CLIP
    submitted by /u/LordPewPew777 [link] [comments]
    What would be the best approach to auto-generate comic panels (Garfield style) with drawings and speech bubbles, assuming I have tons of scans to use as training?
    I'm a software developer but I'm not really experienced in AI. Would it be best to train first for speech bubbles and separately for panel drawings? What kind of network is the best for this? Just thinking that it would be a cool project to have auto generated legible infinite comic strips for a semi niche comic strip that runs in my country. submitted by /u/dananite [link] [comments]  ( 1 min )
    Any Recommendations for AI Content Generation Software?
    Content generation is such a time-suck for small businesses, and it seems like an interesting vertical to apply AI. The AI would generate the content after being given a prompt. There are already a few tools trying this, but the quality doesn't seem to be very high. Are there better tools that I'm missing, or is the consumer-facing software so early-stage that it would be better to hire a data scientist and train an AI system specifically for this purpose? https://www.reddit.com/r/MachinesWrite/comments/f45eav/list_of_ai_text_generators/?utm_source=share&utm_medium=web2x&context=3 https://www.reddit.com/r/juststart/comments/axa8w3/ai_ml_text_generators/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Looking for enterprise conversational AI platform
    submitted by /u/sunstormfirefall [link] [comments]  ( 1 min )
    VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    Microsoft AI Researchers Develop ‘Ekya’ To Address The Problem Of Data Drift On The Edge Compute Box And Enables Both Retraining And Inference To Co-Exist On It
    Deep neural network (DNN) models for object recognition and classification, such as Yolo, ResNet, and EfficientNet, are used in video analytics applications such as urban mobility and smart automobiles. There is a symbiotic link between edge computing and video analytics, claiming that live video analytics is the “killer app” for edge computing. Edge devices come in various sizes and designs, but they are always resource-constrained compared to the cloud. Video analytics deployments send the videos to on-premises edge servers. The article handles the difficulty of supporting inference and retraining jobs on edge servers simultaneously, which necessitates navigating the fundamental tradeoff between the accuracy of the retrained model and the accuracy of the inference. Edge computation is preferred for video analytics because it eliminates the need for expensive network lines to broadcast videos to the cloud while simultaneously preserving video privacy. Edge computation has a finite amount of resources (e.g., with weak GPUs). The mismatch between the increasing rate of model compute needs, and the total cycles of processors exacerbate this problem. As a result, model compression is used in edge deployments. Continue reading our bite on this research Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/07/nsdi22spring-final74.pdf Github: https://github.com/edge-video-services/ekya submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there a AI which is able to turn normal videos into sketches like the video below?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Automatic Summaries of your Documents in Google Docs !
    submitted by /u/OnlyProggingForFun [link] [comments]
    How to achieve a training duration on MindSpore that's less than or equal to that on TensorFlow?
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 1 min )
    CypherZilla - The First Encoded NFT Made By AI To Support Trump. Upvote If You Want To Have A Huge Impact!
    CypherZilla on OpenSea https://reddit.com/link/u8he59/video/i9a29i5ywtu81/player submitted by /u/thecypherbeast [link] [comments]
    What price we have to pay for the progress in AI, have a look-
    https://www.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Collaboration AI video and music
    submitted by /u/Recent_Coffee_2551 [link] [comments]
  • Open

    Question about trained models
    Hello I have a question. For example, in the case of an inverted pendulum or cartpole, I train the model for the pole to be at 0 degrees (vertical) and it works. Then I want this same model to keep the pole at another position, for example, 3 degrees, do I have to train this model again for achieving this to or can I somehow use the model I already trained and what it learnt and input the new position I want it to be? idk if I explained myself I guess its mostly doubts about how to interact with the model and how to properly use a model that has already been trained. If anyone has some example of code (python, gym), on interacting with a trained model it would be really helpful. submitted by /u/Sleyck [link] [comments]  ( 1 min )
    Why is this implementation of PPO using a replay buffer?
    https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/r_mappo.py submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    What is the role of masks in the computation of GAE?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Question About Optimal Policy Guarantees in POMDPs
    I'm working on a project where I'm trying to prove the existence of a particular set of functions by showing it can be constructed as the solution to a Markov Decision Process. However, it seems that it's much simpler to convert it to a partially observable MDP, rather than a classic one. I know it's been proven that the set of optimal policies for a classic MDP is nonempty, and intuitively I feel like the same should hold for POMDPs, but I'm having a hard time finding a particular source proving such a thing. Does anyone know where I ought to look? submitted by /u/LessPoliticalAccount [link] [comments]  ( 1 min )
    Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF)
    submitted by /u/JBaloney [link] [comments]  ( 1 min )
    What is this line in the Sutton/Barto textbook referring to?
    In the first edition of the textbook, the section on actor-critic methods (link) describes the classical approach of using the temporal difference error 𝛿 to modify the probability of selecting action a in state s: https://preview.redd.it/fa35vut7ewu81.png?width=238&format=png&auto=webp&s=c1b8952b065a90ecd2b8c0c30b985e36d37dbc30 Then they briefly mention that one variation on the classical approach is to scale temporal difference error 𝛿 by the inverse of the probability of selecting the action a, where that probability is given by 𝜋(s, a): https://preview.redd.it/69gd3zmbdwu81.png?width=375&format=png&auto=webp&s=b66a7d5eef3c2b7bc256473aed728223921a751c They say: " These issues were explored early on, primarily for the immediate reward case (Sutton, 1984; Williams, 1992) and have not been brought fully up to date." This idea is relevant to a project I'm working on, and I'd like to read more about it. But the references seem to be dead ends: Sutton 1984 is his PhD thesis, which I can't find a digital copy of, and Williams 1992 is this paper, which doesn't seem to contain this idea. Also this section doesn't seem to appear in the second edition of the textbook. You folks are much smarter than I am: Does modifying the update in this way mean anything to you? Are there modern approaches that do something like this? Or should I assume it was a little-explored idea in the early days that has been more-or-less forgotten? Thanks very much! submitted by /u/Careless-Argument-37 [link] [comments]  ( 2 min )
    Reinforcement Learning with delays
    I was wondering what methods there are for RL with time delay other than augmenting the state space with the action buffer or using a model to undelay the environment. I've seen this post How to deal with the time delay in reinforcement learning? - Artificial Intelligence Stack Exchange however it's rather brief and I wondered if there were any more recent advancements. I am also struggling to understand partial trajectory resampling ( 2010.02966.pdf (arxiv.org) ) and the code in the accompanying repo. GitHub - rmst/rlrd: PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020) I was wondering how we can resample actions in environments with constant delays if those actions are used in the state space for all subsequent chosen actions? submitted by /u/SuperDuperDooken [link] [comments]  ( 1 min )
    Is it stupid to use rl to control solar panel angle?
    submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    How can I use the environment in Emergence of Locomotion Behaviours in Rich Environments?
    Hi, I want to train my agent in the environment used in "Emergence of Locomotion Behaviours in Rich Environments". Here is a video about that https://www.youtube.com/watch?v=hx_bgoTF7bs. Is the environment released? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [P] mGPT model released: a multilingual gpt-3-like model for 61 language
    Hi everyone. Today we released the mGPT model: multilingual generative pre-trained transformer The checkpoints are available on Huggingface model page The example usage is at the Github repo https://github.com/ai-forever/mgpt The model has 1.3 billion parameters The context length is 512 tokens. The model can generate sequences after the input prompt, can be used for fine-tuning or for zero- and few-shot learning: from transformers import GPT2LMHeadModel, GPT2Tokenizer model_name = "sberbank-ai/mGPT" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) model.cuda() model.eval() texts = [ "My favourite holiday is ", "Իմ սիրելի տոնն է ", "Моє улюблене свято ", "mi fiesta favorita es ", "मेरी पसंदीदा छुट्टी है", "我最喜欢的节日是", "Min…  ( 2 min )
    [P] Deep Learning GPU Benchmark: A Latency-Based Approach
    Hi r/MachineLearning! I want to share with you a fun side project of mine on benchmarking the GPUs for deep learning: [project page]. https://preview.redd.it/7olwqyze5yu81.png?width=2041&format=png&auto=webp&s=25aecb9733366720a2be5cecc2048eb2a734c9b9 Here are some key features: It helps to estimate the runtime of algorithms on a different GPU. It measures GPU processing speed independent of GPU memory capacity. It contains adjustable weightings through interactive UIs. The project page also explains how this benchmark differs from existing ones, and why this benchmark is more relevant to academic research. I would love to know what you think! submitted by /u/roll-a-dice [link] [comments]  ( 1 min )
    [D] What's your perfect laptop for deep learning research?
    I'm using mbp 2015, it's a pretty solid laptop, I like it a lot, though it feels slow and I've started to look for a replacement. Given that I run all experiment on gpu dedicated servers, my laptop serves me as a typewriter, it's ok, but I'd like to get more out of it. Frankly I'm a bit disappointed by 2021 Macbooks, hope they'll be improved in 2022. Recently lambda labs together with razer announced their tensorbook https://lambdalabs.com/deep-learning/laptops/tensorbook , their pricing looks weird to me, the more you pay the more years of support you have, that's the only thing which differentiates base bundle from enterprise. Also there is no option to customize hardware for it, though basic bundle itself looks ok, its price is $3500 like M1 Max's. What's your opinion about this laptop in particular? would you buy it? generally this laptop looks like a cool thing to have for local model development even from a tent somewhere in Nepal, given that you have enough power banks to charge it. :) What's your choice of a laptop for DL? My biggest requirement is a durable laptop which will serve at least 5 years, better with NVIDA GPU for development and debugging. submitted by /u/taras-sereda [link] [comments]  ( 1 min )
    [R] Deep models of superficial face judgments (PNAS)
    ​ Transformations that alter the perception of target faces Paper: https://www.pnas.org/doi/10.1073/pnas.2115228119 Dataset: https://onemillionimpressions.com/ submitted by /u/joshuacpeterson [link] [comments]
    [R] Planting Undetectable Backdoors in Machine Learning Models
    submitted by /u/Wiskkey [link] [comments]
    [P] VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    [P] Announcing cleanlab 2.0: Automatically Find Errors in ML Datasets
    Hi folks. This morning I released the new cleanlab 2.0 Python package for automatically finding errors in datasets and machine learning/analytics with real-world, messy data and labels. tl;dr - cleanlab provides a framework to streamline data-centric AI. https://preview.redd.it/hq1kyasvwwu81.png?width=2279&format=png&auto=webp&s=4fa3c82ec66d685c8fc4f95c5d9a0fc4be192d6b After 1.0 launch last year, engineers used cleanlab at Google to clean and train robust models on speech data), at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Joined by two good friends from grad school, we completely rebuilt cleanlab 2.0 to work for all data scientists, ML datasets, and models; and hit a…  ( 2 min )
    [P] Galp Hackathon - Win 10.000€ from home!
    If you are passionate about Data & AI we have the perfect challenge for you! The applications for Galp’s Hackathon Retail 4.0 are OPEN! With this Hackathon, Galp is challenging the community to propose solutions to specific problems and use cases that they think could improve their typical customer journey in the service stations. Gather a team and come up with an innovative solution for a chance of winning 10.000€! Let’s shape the future of Galp's retail? Apply now: https://taikai.network/en/galp/hackathons/retail40 https://preview.redd.it/wkfb6ybuwwu81.png?width=3334&format=png&auto=webp&s=deef13767df5ba607e387ce4e278ae3981d93582 submitted by /u/migueldsalmeida [link] [comments]  ( 1 min )
    [D] Imbalanced multi class classification 📌
    I'm working on a Machine Learning problem for multi class classification with imbalanced classes distribution, so obviously my model favours classes with more data and fails to predict classes with few data, what are the techniques I can use to help the model distinguish all the classes the same way ? P.S I'm avoiding to use SMOTE method to train the model on real used data rather than generated submitted by /u/According-Promise-23 [link] [comments]  ( 2 min )
    [R] CVPR 2022 - Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
    submitted by /u/SleekEagle [link] [comments]
    [R] My continuously updated machine learning research notes
    Dear ML researchers, For the past many years, I've been updating my machine learning research notes for my PhD students and everyone online continuously. I don't like uploading to arxiv to get "citations", and GitHub serves me well: Hope they are useful for you: https://github.com/roboticcam/machine-learning-notes Richard, submitted by /u/MLknowledge [link] [comments]  ( 1 min )
    [D] Correcting for imbalance in regression datasets
    Hi, I am performing a Image --> scalar regression. The output scalar I am trying to estimate follows a roughly Gaussian distribution. I notice that the DNN output is biased to output values towards the mean (makes sense). ​ This seems like a problem of imbalanced data. For classification, I can oversample minority classes. What is the equivalent for regression? Is there an equivalent technique for regression where we oversample "outliers" and undersample central values. submitted by /u/rsandler [link] [comments]  ( 1 min )
    Building Dense Passage Retrievers [P]
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] How do you usually run sanity checks when training GANs ?
    Hi, I have been studying super-resolution with gans and took a look at SRGAN et ESRGAN. I have spent the whole day running experiments in order to find if I can manage to overfit on a single batch of 16 / 32 / 128 examples (MNIST). I have found out that it's almost impossible to use this tactic as a sanity check because it simply cannot generate good quality samples. I would like to know what are your thoughts on this, and how you would run sanity checks regarding GANs. ​ Thank you ! submitted by /u/Frizzoux [link] [comments]  ( 1 min )
    [D] Amazon Releases a New Multilingual Dataset for NLU
    https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding submitted by /u/__lawless [link] [comments]
    [D] How to handle features that apply to a whole csv-file vs single rows?
    Hi all, I have csv-files (~300) with a fixed set of columns (~40) but varying number of rows (sum of all rows ~300 000) and multiple labels per csv that I want to predict. Because of the limited number of csv-files and as a first try I am predicting the labels row-wise (attaching the label to all the rows of one csv-file) which works well for some labels but not for others. Currently, I am calculating some features for every row and just appending them to the row and some features for the whole csv-file and appending them to every row. Two problems are now arising that I would like to hear some input about: The number of features per csv is growing and it seems like a waste to copy them to every row. For some labels it is probably reasonable to throw away most of the rows and only feed in a handful. How would you design a structure that incorporates the limited number of csv-files and the different ways to treat features (row vs. csv)? submitted by /u/tlklk [link] [comments]  ( 2 min )
    [R][P] Differences in publishing a paper at a conference and in a journal?
    Hi! I am an undergrad and I am going to start my MS in CS this fall. My research interest is mainly in Multimodal Learning for language and Speech. I have written papers before but both my papers have been peer reviewed journal papers (Knowledge-Based Systems, Elsevier) [1] [2] I now want to start publishing papers in conferences since I have noticed that it is much easier to get noticed and recieve reviews when the paper is presented at a conference. I want to understand how different is the publication process for conferences? I also wanted recommendations on conferences in the NLP and Speech area, considering this will be my first conference paper. Thanks! (I would also appreciate reviews on my papers if anyone has the time to look them over. Thanks!) submitted by /u/prabhav55221 [link] [comments]  ( 4 min )
    [D] How do you get the maximum of arxiv sanity?
    Basically, I don't want to phrase this as a a "how-to" post but arxiv-sanity-lite really bothers me. How do you guys find recent papers in your area of interest which are promising besides following what is published at major conferences? I believe the website is "too lightweight". For example, what if I am interested in computer vision papers and I specify that in the tags field (i.e. explicitly typing "computer vision"). How can I list the papers based on a score (basically goodness of the paper)? Why does using shortcuts (basically links) like `````recommend over last week or recommend over last 3 days always (at least for me) end up with 0 results? I've never used the original arxiv-sanity before so I strongly believe that there is something that I am missing. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [N] New opportunity: PhD Candidate within multisensor data fusion and applied machine learning for analysis of Arctic sea ice
    The Norwegian University of Science and Technology (NTNU) has a vacancy for PhD Candidate within the DIGITALSEAICE project . The project aims to build a multi-scale digital infrastructure that integrates local and regional sea ice models for improved forecasting and understanding of variations in polar ice conditions. More information here: https://www.jobbnorge.no/en/available-jobs/job/224802/ submitted by /u/KatjaKim [link] [comments]  ( 1 min )
    [P] Efficient Deep Learning Book
    We are working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. The goal is to introduce these ideas in a single place, without having to parse many papers, try to get a working code sample, and then spend time debugging. With the accompanying codelabs, we hope that our readers can make their models 4-20x smaller, faster, and better in quality. We have released the first four chapter's draft PDFs, and would truly appreciate any sort of comments / feedback. Book: efficientdlbook.com Feedback: hello@efficientdlbook.com submitted by /u/EfficientDLBook [link] [comments]  ( 1 min )
    [D] Interview w/ Google Brain researchers on Sparse Expert Models (Switch Transformers, GLAM, and more...)
    https://youtu.be/ccBMRryxGog This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models. Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models. ​ OUTLINE: 0:00 - Intro 0:30 - What are sparse expert models? 4:25 - Start of Interview 5:55 - What do you mean by sparse experts? 8:10 - How does routing work in these models? 12:10 - What is the history of sparse experts? 14:45 - What does an individual expert learn? 19:25 - When are these models appropriate? 22:30 - How comparable are sparse to dense models? 26:30 - How does the pathways system connect to this? 28:45 - What improvements did GLAM make? 31:30 - The "designing sparse experts" paper 37:45 - Can experts be frozen during training? 41:20 - Can the routing function be improved? 47:15 - Can experts be distributed beyond data centers? 50:20 - Are there sparse experts for other domains than NLP? 52:15 - Are sparse and dense models in competition? 53:35 - Where do we go from here? 56:30 - How can people get started with this? ​ Papers: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905) Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906) submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [P] Interactive semantic map of ICLR 2022
    Next week ICLR 2022 is taking place. Fully virtual and 1000+ high quality papers. To make sense of this volume of papers we have indexed the papers and provide an interactive semantic map of #ICLR2022, check out: https://search.zeta-alpha.com/?q=&d=ly&doc_sources=ICLR&sort_by=authority To enjoy the full map, click on [Explore more] and then enter full screen mode. We will also discuss the program and 10 must read papers in the Zeta Alpha "Trends in AI" ICLR edition webinar Monday 25th, for which you can sign up here. https://us06web.zoom.us/webinar/register/7816505274568/WN_82DzwhXZQbOCSTWgaI9xMw Looking forward to meet you online at ICLR 2022! https://preview.redd.it/6wdqj4ru7uu81.jpg?width=2202&format=pjpg&auto=webp&s=c97417c9ea39919041949bf3aa38ad33bb6eca5a submitted by /u/EngineerZetaAlpha [link] [comments]  ( 1 min )
    [R] MindSpore Paper Interpretation: MIEHDR CNN: Main Image Enhancement based Ghost-Free High Dynamic Range Imaging using Dual-Lens Systems
    This article is reproduced from Zhihu and translated by DeepL for enthusiasts to communicate. 1. Research Background High dynamic range images (HDR) are mainly oriented to picture display technology. In a certain scene, if the range of high and low luminance areas exceeds the maximum luminance range of the image, the display effect will be greatly reduced, and HDR is to better solve this problem, it can record a broader range of luminance images, so as to obtain a more effective display effect. The current solution to the problem of generating high dynamic range images (HDR) focuses on the fusion of two low dynamic range (LDR) images of different exposures taken with the same camera. In such a solution by the camera shake or object movement during the exposure time to produce the proble…  ( 3 min )
    [D] Most efficient way to use large image datasets with clusters for ML?
    I am having trouble finding some general information on this subject. I know I am down the rabbit hole when google doesn't have an answer. I want to know best practices and information on using clusters for machine learning with large amounts of data. I believe I have a close to an optimal solution but wanted to get some other opinions on the subject. My current setup: AWS EKS Kubernetes for a cluster Kubeflow for ML platform Katib for HPT jobs Pytorch for custom models Spot instance GPUs Lustre for file serving to the models My Data: Millions of Images stored in S3 ~50TB of data What is the most efficient way to move my data to the cluster? My current approach: Preprocess the data with a dedicated instance and store it in S3 Master runs on a dedicated node Katib spins up a set number of GPU spot nodes A claim is made, and an FSx Lustre system is generated for the pod Advantages: Very fast training and data movement with spot training Disadvantages: I have to spin up several Lustre systems for the training Preprocess the data with a dedicated instance and store it in S3 Possible alternative Same as above but use EFS as a distributed file system so I don't have to wait for Lustre Advantages: Potentially cheaper as I have only one FS Disadvantages: Slow throughput, read this was a bad idea Master runs on a dedicated node Other alternatives UseKatib spins up a PyTorch streaming function with S3(boosted transfer speed)set number of GPU spot nodes Every pod starts a claim is made and downloads data to an EBS Give up and switch to SageMakerFSx Lustre system is generated for the pod Anyone with experience in these technologies I would really appreciate hearing your thoughts. submitted by /u/thewineiswater [link] [comments]  ( 1 min )
    [D] [P] Neural network: same prediction for different inputs
    I am getting the same prediction for different inputs. I am trying to use a regressional neural network. Since data is huge, I am training one example at a time. Here is a simplified version of my code. model = Sequential() model.add(Dense(10000, input_dim=212207, kernel_initializer='normal', activation='relu')) model.add(Dense(100, activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(10000000): #X is input with 212207 values #Y is a output value if i<6000000: model.fit(X.transpose(), Y, epochs=30, batch_size=1, verbose=0) else: prediction=model.predict(X.transpose()) I made sure that I am training on different examples and trying predictions on different examples. I am still getting the same prediction value for all testing inputs. I think I made some mistake in defining the model for regression neural network. Can you please check if the code is correct? submitted by /u/exoplanet_hunter [link] [comments]  ( 1 min )
  • Open

    Fixed points of bilinear transformations
    Introduction I was puzzled the first time I saw bilinear transformations, also known as Möbius transformations. I was in a class where everything had been abstract and general, and suddenly thing got very concrete and specific. I wondered why we had changed gears, and I wondered how there could be much to say about something […] Fixed points of bilinear transformations first appeared on John D. Cook.  ( 2 min )
    Partitioning complexity
    This post looks at how to partition complexity between definitions and theorems, and why it’s useful to be able to partition things more than one way. Quadratic equations Imagine the following dialog in an algebra class. “Quadratic equations always have two roots.” “But what about (x – 5)² = 0. That just has one root, […] Partitioning complexity first appeared on John D. Cook.  ( 4 min )
  • Open

    Hidden Interfaces for Ambient Computing
    Posted by Alex Olwal, Research Scientist, Google Augmented Reality and Artem Dementyev, Hardware Engineer, Google Research As consumer electronics and internet-connected appliances are becoming more common, homes are beginning to embrace various types of connected devices that offer functionality like music control, voice assistance, and home automation. A graceful integration of devices requires adaptation to existing aesthetics and user styles rather than simply adding screens, which can easily disrupt a visual space, especially when they become monolithic surfaces or black screens when powered down or not actively used. Thus there is an increasing desire to create connected ambient computing devices and appliances that can preserve the aesthetics of everyday materials, while providing …  ( 7 min )
  • Open

    Specify and extract information from documents using the new Queries feature in Amazon Textract
    Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract now offers the flexibility to specify the data you need to extract from documents using the new Queries feature within the Analyze Document API. You don’t need to know the structure of the […]  ( 11 min )
  • Open

    Understanding the Difference between Loss Functions and Metrics in Machine Learning/Deep Learning
    Yes! You read the heading right. There’s indeed a difference between loss functions and Metrics in the field of Machine Learning. However…  ( 2 min )
  • Open

    A new state of the art for unsupervised vision
    MIT CSAIL scientists created an algorithm to solve one of the hardest tasks in computer vision: assigning a label to every pixel in the world, without human supervision.  ( 7 min )
    Anticipating others’ behavior on the road
    A new machine-learning system may someday help driverless cars predict the next moves of nearby drivers, cyclists, and pedestrians in real-time.  ( 7 min )
  • Open

    Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors
    Your next trip to the dentist might offer a taste of AI. Pearl, a West Hollywood startup, provides AI for dental images to assist in diagnosis. It landed FDA clearance last month, the first to get such a go-ahead for dentistry AI. The approval paves the way for its use in clinics across the United Read article > The post Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors appeared first on NVIDIA Blog.  ( 4 min )
    GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW
    The gods must be smiling this GFN Thursday — God of War today joins the GeForce NOW library. Sony Interactive Entertainment and Santa Monica Studios’ masterpiece is available to stream from GeForce NOW servers, across nearly all devices and at up to 1440p and 120 frames per second for RTX 3080 members. Get ready to Read article > The post GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Building Dense Passage Retrievers
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    NN from Scratch: #4 Backward Propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Searching for volunteers for ML-based Ukrainian volunteer project.
    We are searching for trustworthy volunteers with some free time who would like to contribute to a digital Ukrainian volunteer project. Our system heavily relies on an image recognition system with a number of specialized filters involvg facial recognition, object recognition, logo detection, photoshop detection etc. People with professional experience with any of these things is preferred, but novice ML people are welcome to join us in a different capacity. DM to learn more about the project, glad to discuss the details with you. submitted by /u/eelgirl [link] [comments]  ( 1 min )
  • Open

    18 Differences Between Good and Great Data Scientists
    If you are employed as a data scientist and have survived (or strived!) in your position for more than a year, chances are you are at least a good data scientist. This is particularly true if you were promoted. The difference between a mediocre and a good data scientist will be the topic of a… Read More »18 Differences Between Good and Great Data Scientists The post 18 Differences Between Good and Great Data Scientists appeared first on Data Science Central.  ( 6 min )

  • Open

    Is there any difference between how DDPG and PPO use the replay buffer?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Any tips for a prospective graduate student in Reinforcement Learning?
    Hello Everyone, I apologize ahead of time if posts like this aren't looked well upon on this sub, but I couldn't find rules against this and I also think this is the best, most niche sub for my question. I also made a new account just to be safe haha. ​ Anyways, I will be graduating this spring with a BS in Computer Science and a BA in Mathematics. I have been researching Machine Learning since my sophomore year (adversarial machine learning) under a professor at my university and recently took upon a second, concurrent research position in RL since last summer. ​ My goal is to get into a PhD program at a higher level than my current university (my current university is good, but doesn't really have much of an AI focus as I've already taken all the AI grad courses as an undergrad). I'm…  ( 3 min )
    Task Allocation problem with graph representation
    Hey everyone, I've recently started working on a task allocation problem using RL. I'd just like to make sure my thinking is correct on how to best approach the problem. At the moment, we have (effectively) a graph traversal sim for n number of agents, where the goal is to minimize the total distance over an episode, as determined by setting the correct tasks. The task supplied to each agent will determine the route that is taken, and therefore the distance. The current idea is to supply an input graph that also contains information on the current location of the agents. A second input would be the set of available tasks. The expected output would be done through a pointer network, where we produce a reordered set of the tasks in descending order of optimality. When step is called, the sim runs until a new task is needed (agent completes it's route). ​ In general, does anyone know a good way to represent the inputs and output of this problem? A pointer network seems like it could work to produce actions, but if I need to do a forward pass for every agent, it seems that there would be no consideration of other agents when determining tasks (We shouldn't have 2 agents doing the same task). For the graph representation, a graph nn seems like an obvious choice, but I just wanted to see if anyone had any insight on why they may or may not be used. submitted by /u/asdfsflhasdfa [link] [comments]  ( 1 min )
    Universities working on reinforcement learning for robotics.
    Can you name any good universities (with high acceptance rate) which are working on reinforcement learning for robotics and also accept students from other branches (i.e. Electrical, Mechanical Engineering). submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Is the game of chess a finite MDP?
    In the standard intro to RL book, I have read that any MDP that has finite actions and states is a finite MDP. But that limit is subjective. So there are approximately 1045. If I limit myself to 105 states, can I say that chess isn't a finite MDP? submitted by /u/BraveProfessional656 [link] [comments]  ( 1 min )
    Reinforcement learning over traditional machine learning method in Finance/Banking ?
    I am currently studying use cases of RL in finance/banking/insurance and I am keen to understand what are its advantages and disadvantages than traditional methods. submitted by /u/kachua26 [link] [comments]  ( 1 min )
  • Open

    FormNet: Beyond Sequential Modeling for Form-Based Document Understanding
    Posted by Chen-Yu Lee and Chun-Liang Li, Research Scientists, Google Research, Cloud AI Team Form-based document understanding is a growing research topic because of its practical potential for automatically converting unstructured text data into structured information to gain insight about a document’s contents. Recent sequence modeling, which is a self-attention mechanism that directly models relationships between all words in a selection of text, has demonstrated state-of-the-art performance on natural language tasks. A natural approach to handle form document understanding tasks is to first serialize the form documents (usually in a left-to-right, top-to-bottom fashion) and then apply state-of-the-art sequence models to them. However, form documents often have more complex layouts …  ( 8 min )
  • Open

    [D] Who are using physics informed neural networks (PINN) in the industry?
    I stumbled upon this JD from Hitachi Energy, which mentions PINN in the section of preferred background: https://www.linkedin.com/jobs/view/2923292435/ Is PINN gaining more attention? And are there more players? submitted by /u/Kohomologia [link] [comments]  ( 1 min )
    [D] Is quantum AI a real thing? (from the software perspective)
    Hi all I'm keeping an eye on state of the art in quantum hardware, but what about software? I can think of many questions and maybe some of you are in the field. What should be the impact of quantum on ML/DL, realistically? What might be a roadmap for the software? And would quantum simulators do already have some benefits on AI? What are the best projects out there? I've seen many but haven't been very convinced submitted by /u/IntelligentHat1657 [link] [comments]  ( 2 min )
    [D] A more fair AI freelancer marketplace that cares freelancers' career advance and benefits
    Hi, ML freelancers. I'm starting a freelancing marketplace, tailored only for AI talents, and I especially care about the welfares of freelancers, and plan to add these: (1) you will be more treated as the employees of the platform, thus we provide training(for all), potentially health care plan(for people have stably worked >20 hours a week), and career advance plan, mentors from experienced freelancers where you get to learn (2) open discussion between employers and you so that you can scope the project better, set a reasonable rate, and timeline (3) we potentially provide MLOPs tool to improve your productivity. (4) we avoid global competition by matching business only with local region-freelancer or areas that are more expensive. How attractive do you think this will be? And any of these benefits already been provided by upwork, freelancer, toptal, fierr? submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [D] Diffusion models video tutorial
    Diffusion models have been behind a recent string of impressive generative results, including OpenAI's DALL-E 2. They’re powered by a simple yet expressive core mechanism. New video covering how they work: https://youtu.be/fbLgFrlTnGU submitted by /u/ariseff [link] [comments]
    [D] Building the Model Behind DoorDash’s Expansive Merchant Selection
    Interested in how DoorDash maintains a well performing and diverse selection in the numerous markets they operate in despite entering the delivery market relatively late ? I had the opportunity to collaborate in this project which involved building a number of models that measured customer preferences, identified market cuisine categories, and predicted merchants' performance on the platform. I wanted to share the approach and some of the technical details with the ML community to get feedback on what we can improve and to show this cool use case to others working on similar sales enablement based models. Check out the blog post I wrote and let me know what you think of our approach. Building the Model Behind DoorDash’s Expansive Merchant Selection submitted by /u/EfficientString7431 [link] [comments]  ( 1 min )
    [D] What's hot in deep learning research at the moment ?
    I took a break from deep learning( starting from last October) , now i want to get back, start with a new project and read papers . Where should i focus ? Should i keep working on vision transformers or maybe start something on geometric deep learning . What's hot and what's going on ? submitted by /u/ovotheking [link] [comments]  ( 1 min )
    [P] A simple PyTorch YOLOv1 training pipeline GitHub Repo
    https://github.com/sovit-123/yolov1_pytorch_voc07 ​ Also, I write about Deep Learning and Machine Learning on https://debuggercafe.com/ Please check it out and let me know if somebody wants any blog posts on a specific topic. submitted by /u/sovit-123 [link] [comments]
    [P] Programmatic: Powerful Weak Labeling
    Hi all!, Really excited to share a project we've been working on and get your feedback! We've made: Programmatic — an NLP annotation tool for building large labeled datasets for NLP without manual annotation Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to …  ( 2 min )
    [D] What's your opinion on project promoting posts in this sub? Your vote matters.
    There are many projects promoting in this sub, you may like or dislike. And if any of my posts you dislike, allow me to apologize first. However, it gets me to think. Several years ago I'm a moderator in a quite large forum, because I don't have enough time to fulfill my responsibilities, then I decided to retire (yes, they can, and I remained as the vip user which only retired moderators can be). This is a large community, a machine learning community. Besides continuously removing some of these posts, and no clear rules on it, can we do any better? We got all the data, and we just cannot train the model? Here are my three proposal, and please give some excellent ideas besides my poor ones: Self promoting post should have values other than itself, and not having annoying contents Self promoting project can be used as a tool in a non self promoting posts, as long as the posts creates valuable contents and the promoting is not obvious and annoying. Depends on the number of new project posts, Weekly/Daily project post can be created by moderator and pinned to the top. All the promoting content goes into the comment. We can explore and upvotes. Here are some illustrations: 1. Direct Promoting Post ​ 2. Indirect Promoting Post ​ Weekly/Daily Promoting Post by Moderator, Pinned to Top, Comments by project owner, upvotes/downvotes by us Which do you think is acceptable? Or you have better ideas? Leave a comment. It's a machine learning sub, don't make machine to solve it better than us. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 3 min )
    [R] Differentiable signal processing for optical communication with Google JAX
    Hey folks, I wrote a mini project based on JAX for optical communications signal processing. https://github.com/remifan/commplax I have a research article as a use case demo, https://remifan.github.io/gdbp_study/article.html This tool essentially implements adaptive DSP equalizers as stateful NN layers (thanks to Jax's explicit stateful syntax) implements compositor interfaces from scratch to wrap up those stateful layers with other regular NN layers so that they can be trained together Homebrew serial compositions of stateful layers It is a fun project for me and I feel JAX really elegantly fits this research use. What do you think about JAX? I appreciate your comments:) submitted by /u/StreetPrice1909 [link] [comments]  ( 1 min )
    [D] Tracking the hardware usage while running CV NN Model on a 1000 Images
    Hi guys, I've been working on a machine learning project and I wanted to see how hardware resources are being used when I run inference on let's say 1000 images. How could i calculate the CPU(running inference on CPU)/RAM workload in that timeframe? I'm running it on a Linux Ubuntu VM. Thanks in advance! submitted by /u/Fifi0912 [link] [comments]  ( 1 min )
    [D] Running interactive Python notebooks on HuggingFace Spaces
    I'm working on a framework Mercury for converting Python notebooks into interactive web apps. It can add widgets to the notebook based on the YAML configuration. End-user can tweak widgets values and execute the notebook. The resulting notebook can be downloaded as single-file HTML. Simple. The framework is built on Django+React. It is easy to deploy to Heroku or other cloud services. Recently, I made it possible to deploy it to Hugging Face Spaces (faster and larger machines than on free tier Heroku). The process of deployment is simple. You need to create a Gradio app on Spaces (my framework is not supported, yet ;) ). You need to add the app.py file that will run the Mercury server and upload the notebook. You can check the details in the docs. The HF Space with example notebook https://huggingface.co/spaces/pplonski/deploy-mercury submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Conditional GAN with multiple adversarial losses - Implementation?
    I would like to test the architecture from the following paper with a different dataset: https://www.mdpi.com/2072-4292/13/19/3834 The authors state that their objective function is the following: https://preview.redd.it/u78f27jb6nu81.png?width=1027&format=png&auto=webp&s=32790a67ec829a1e79b252edd0714b8b3b5a7f4e Where: -x is the real grayscale image. -s is its downsampled version, which should be used both as the initial imput of the generator performing the super-resolution and as a first conditional variable in the learning process. -e is another two-dimensional array containing values for a second additional conditional variable. The authors, however, state that this should be implemented by using two separate conditional adversarial losses, one for each of the conditional variables. To clarify, the first adversarial loss should be: AdvLoss1(ParametersG, ParametersD) = - Log(Discriminator(x,s) - Log(1-Discriminator(Generator(s),s) While the second would be: AdvLoss2(ParametersG, ParametersD) = - Log(Discriminator(x,e) - Log(1-Discriminator(Generator(s),e) Which should be then summed up for the backward pass. In my pytorch implementation, however, I have only been able to set up a unique adversarial loss, which could be defined as: CurrentAdvLoss(ParametersG, ParametersD) = - Log(Discriminator(x,(s,e)) - Log(1-Discriminator(Generator(s),(s,e)) I have tried to implement implemented as follows:(simplified version) which I calculate in the following training loop (simplified version, from the same question asked in the Pytorch forum) as errD and errG after conditioning the network on both s and e at the same time: https://discuss.pytorch.org/t/conditional-gan-with-multiple-adversarial-losses/149627 My question is, is there a way to modify the following loop to obtain outputs that have been separately conditioned only first on s and then on e and thus calculate the two separate adversarial losses originally proposed by the authors instead? submitted by /u/Franken91 [link] [comments]  ( 2 min )
    [D] IJCAI 2022 Paper Notification
    This is the discussion for accepted/rejected papers in IJCAI 2022. Results are supposed to release today. submitted by /u/errohan400 [link] [comments]  ( 1 min )
    [R] Authors Claim to Have "Solved" MNIST and CIFAR
    Paper: https://arxiv.org/abs/2204.07953v1 Code: https://github.com/decurtoydiaz/learning_with_signatures Tangential resources of interest: https://arxiv.org/abs/1905.08494, https://en.wikipedia.org/wiki/Rough_path#Signature, and https://labelerrors.com/ Personally, I believe from their code on Github, they have a possible data leakage (in the same vein of the current issue raised there) as well as an accuracy of 100% on a test set is fishier than a fish market. However, I am very curious to hear from the court of public opinion. How is everyone feeling about this? submitted by /u/blingblingbeepbeep [link] [comments]  ( 4 min )
    [D] How do I evaluate if my data represent the target variable before training a machine learning algorithm?
    I have a dataset of points cloud where each point in the point cloud has a variable. I am trying to relate the local geometry features to that point variable by using FPFH, This means I am generating my own features from the dataset by first using an area of n-points to compute normal-vector estimations and from x normal vector estimations to compute the FPFH. However, the numbers x and n are arbitrary and other combinations might describe the target variable better. So I wanted to know if there was a method to evaluate how good a given x and n value are at describing the target variable. I considered the correlation between the features (n,x) and the target variable but I read that this assumes linear combination redundancy. I am using scikit-learn. So basically I have features X(x,n) and a target variable Y. Which x and n, in the feature space X(x,n), describes the target variable, Y, best. I want to do it before the training because when I try to train it with my random forest regressor it takes 3-4 hours and I want to test for more combinations. submitted by /u/Neo-Rushdian [link] [comments]  ( 1 min )
    [Discussion] Training performance evaluation of MindSpore, a home-grown deep learning framework -- by ADSL Lab, CSU
    The article is reproduced from Zhihu, using deepl machine translation, for all enthusiasts to communicate Abstract Deep learning frameworks are the engines and motors for pushing the boundaries of artificial intelligence applications, and good deep learning frameworks can dramatically shorten the cycle of algorithm innovation and validation. In this report, we focus on the newly launched MindSpore framework, which has received a lot of industry attention, and systematically explore its model training speed on GPU clusters and compare it with popular international frameworks. In the evaluation experiments, we choose two classical models, ResNet and BERT-base, to test and analyze their performance with the same algorithm, the same dataset, and the same or similar performance hardware platf…  ( 7 min )
    [D] Why is the diffution model so powerful? but the math behind it is so simple.
    You can see the 200 lines code here: https://nn.labml.ai/diffusion/ddpm/index.html and https://github.com/cloneofsimo/minDiffusion, math is here: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ The algo is smart and simple, but it's generation result seems more incredible than GANs, and its speed is fast, the model size is not too big: https://openai.com/dall-e-2/ , https://huggingface.co/spaces/multimodalart/latentdiffusion, https://www.reddit.com/r/dalle2 So 1st question: why is diffusion model so powerful? Can someone explain it? 2st question: Has anyone used diffusion for NLP? ​ UPDATED: ​ \"A multiverse portal to a new world opening up above Tokyo\" by dalle2 (from r/dalle2) \"A robot painting on a canvas while playing the piano\" by dalle2 (from r/dalle2) ​ \"Mona Lisa in her studio painting Leonardo da Vinci \" by dalle2 (from r/dalle2) \"Science fiction illustration future city in the night | impressionism\" by latentdiffusion ​ \"Science fiction illustration of Beauty and monsters | impressionism\" by latentdiffusion ​ \"a painting of a girl with a fox sitting in a field at sunrise in the style of Claude Monet\" by latentdiffusion submitted by /u/ghosthamlet [link] [comments]  ( 4 min )
    [D] Questions about Intel 12th gen Alder Lake CPUs
    I am looking to build a new PC but have struggled to find the info on how Intel's latest CPUs perform for data science/ML, so if anyone is using one for that purpose and can help with one or more of these questions it would be very helpful! Apologies if these questions should be directed elsewhere. I am planning to use WSL2/Ubuntu but have heard that Intel's thread director isn't implemented well yet in Linux (or Windows 10!), so it doesn't assign tasks properly. Has anyone experienced issues with this firsthand? Assuming the thread director is working, are the e-cores utilised at all in any typical DS workflows? E.g. will they get used with joblib or when training scikit-learn/gbms in parallel? Are the e-cores good enough to handle other stuff like web browsing etc whilst the p-cores are maxed out on model training, or is it still necessary to keep at least one p-core free to avoid crashing the PC? Also I have read that in Windows 11 (where the thread director works best) that the active window/tab could be assigned p-cores as a priority, which isn't very helpful for someone who needs to train models in the background etc, but not sure whether this is actually happening in practice. The consensus from benchmarks/reviews is that the hybrid architecture 'just works' and is superior to AMD right now, but those benchmarks are primarily for use in gaming/video editing. submitted by /u/FightingLikeBeavers [link] [comments]  ( 1 min )
    [N] The new Machine Learning Specialization by DeepLearning.AI and Stanford Online is launching soon! Join the Waitlist.
    We’re thrilled to announce a brand new Machine Learning Specialization, in collaboration with DeepLearning.AI, launching in June on Coursera! Learn essential real-world skills from AI pioneer Andrew Ng, who co-founded Google Brain and Coursera, led AI research at Baidu, and has impacted millions of AI learners. This updated 3-course Specialization will cover the latest machine learning techniques as well as foundational AI concepts that made its predecessor one of the world’s most popular machine learning courses. Join the waitlist! https://preview.redd.it/yujr31t6vku81.png?width=5000&format=png&auto=webp&s=0f4c4ef090bcdc7cfb04ee2c817d766f23c236a6 submitted by /u/Stanford_Online [link] [comments]  ( 1 min )
  • Open

    How Meta's multiverse could prove our universe is a fake
    submitted by /u/estasfuera [link] [comments]
    SingularAgent - Many Methods Make Light Work
    submitted by /u/dantheman333 [link] [comments]
    General AI In Healthcare | Machine Learning For Cardiovascular Disease | Color Night Vision
    submitted by /u/getrich_or_diemining [link] [comments]
    A realistic image AI software
    submitted by /u/Eurokiwiboy [link] [comments]
    Ant colony simulation
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    Is there any free open source AI model available for answering any bible related queries?
    A few years back I developed a very simple app just to show a few bible verses. Though it is a very simple app, it got more than 50K installs without much promotion. So, I am thinking about promoting it. But hesitate to do it as it is very simple app. So, I would like to add some useful feature before start promoting it. I would like to add a feature which will allow the users to ask any question related to bible, and it should be giving relevant answer. I assume that some bible data is open source. Is there any free tutorial available to know about how to implement AI based chat system for answering any bible related queries after training with bible data. Is there any app already providing this feature? submitted by /u/qptbook [link] [comments]  ( 1 min )
    Today, AI is becoming ubiquitous, in and out of the workplace. With artificial intelligence (AI) becoming more powerful, the questions that surround AI ethics are becoming more relevant.
    But can technology be controlled to avoid adverse outcomes? Let's understand how AI will help us to make a better world. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Top Ethical Challenges in AI – The Price of Progress
    submitted by /u/JencyJane [link] [comments]
    Artificial Nightmares: Dr. Strange || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Weekly China AI News: Chinese Prominent AI Lab Plagiarizes Big Model Paper; Microsoft Research Asia Halts Internship Hiring from US-Banned Universities; Beijing Announces New RISC-V Chip Institute
    submitted by /u/trcytony [link] [comments]  ( 1 min )
  • Open

    French palindromes and Morse code
    I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email. Palindromes A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word […] French palindromes and Morse code first appeared on John D. Cook.  ( 2 min )
    Blaschke factors
    Blaschke factors are complex functions with specified zeros inside the unit disk. Given a complex number a with |a| < 1, the Blaschke factor associated with a is the function Notice the semicolon in b(z; a). This is a convention that a few authors follow, and that I wish more would adopt. From a purely […] Blaschke factors first appeared on John D. Cook.  ( 2 min )
  • Open

    AI Application Development Guide for Business Owners
    To start deeply investigating the AI app development process, it’s important to first understand how these projects differ from regular app…  ( 9 min )
  • Open

    Neural Network gets too large and dies
    Hi, I've been working on a project for my computer science class and everything has been working up until the training. I'm following a guide online that has worked fairly well. Whenever I try to train, however, I run into an overflow error and the entire network dies. I'm not sure where to go from here as I've tried a few steps to fix the issue, if anyone could offer up some advice to fixing my problem that would be amazing. submitted by /u/djm710 [link] [comments]  ( 2 min )
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated) -
    submitted by /u/maneesh123456 [link] [comments]
    Question about Sigmoid and Heaviside
    I read a paper and was a little bit confused: In the paper it said: "Imagine you have a two dimensional (binary input) classification (0 or 1) problem and you use Sigmoid as an acitivation function. Since the Sigmoid gives you a real number between 0 and 1, it's not really classification anymore. Therefore you take the input of Sigmoid (y_Sigmoid) and put this into a modified heaviside function H(y-0.5) (so for y_Sigmoid bigger than 0.5, it gives you yHeavi = 1) The decision boundary is given by a straight line w1a1+w2a2+w0=0 and this whole process, it only works with the Sigmoid function as first activation function." The last paragraph confused me. Why can I assume that the decision boundary is exactly that (It's just the "normal decision boundary" for a SLP, why does it work here a also) and why does it work with only Sigmoid Function as first activation function submitted by /u/LawlHeyman [link] [comments]  ( 1 min )
    Starting a neural network
    I want to create a program that can take music i feed it and over time create its own music based on the inputs. I know i have to use a neural network and deep learning algorithms but how do i get started. Thanks. submitted by /u/Saxy-Snark [link] [comments]  ( 2 min )
  • Open

    A First Course on Deploying Python Projects
    After all the hard work on developing a project in Python, we want to share our project with other people. It can be your friend or your colleagues. Maybe they do not interested in your code, but they want to run it and make some real use of it. An example is you created a […] The post A First Course on Deploying Python Projects appeared first on Machine Learning Mastery.  ( 10 min )
  • Open

    Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
    A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. However, many recent algorithms reframe RL as a supervised learning problem. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). We find the simplicity of these methods quite appealing. If supervised learning is enough to solve RL problems, then offline RL could become widely accessible and (relatively) easy to implemen…  ( 5 min )
  • Open

    DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs
    I just moved. I’d like to say that I was highly organized, that I knew where every box ended up and what was in each box. I would be lying. Most people who move know the feeling of living in boxes even after the movers have left, the days spent dodging labyrinths of teetering cardboard,… Read More »DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs The post DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs appeared first on Data Science Central.  ( 4 min )
    How Microsoft Power BI Revolutionizes Business
    As cloud-based business intelligence becomes more and more popular in the market, one name has made quite a mark: Power BI. A Microsoft offering, Power BI is an interactive data visualization and analytics tool that promises to revolutionize business. Here are some of its key benefits to help you see how it can do that:… Read More »How Microsoft Power BI Revolutionizes Business The post How Microsoft Power BI Revolutionizes Business appeared first on Data Science Central.  ( 3 min )
    How AI and ML are transforming data quality management?
    Introduction In recent years technology has become prominent, both at work and at home. Machine learning (ML) and Artificial Intelligence (AI) are evolving quickly today. Almost everyone will have some interaction with a form of AI daily. Some common examples include Siri, Google Maps, Netflix, and Social media (Facebook/Snapchat).AI and ML have popularly used buzzwords… Read More »How AI and ML are transforming data quality management? The post How AI and ML are transforming data quality management? appeared first on Data Science Central.  ( 4 min )
    Agile, Agile 2 and Agility, Part II
    In the previous article in this series, we discussed the difference between Agile and business agility and how Agile 2 addresses some of the omissions and failings of traditional Agile.  Both Agile and Agile 2 focus on accelerating digital development; however, the benefits of any Agile approach can be obviated if it is not implemented… Read More »Agile, Agile 2 and Agility, Part II The post Agile, Agile 2 and Agility, Part II appeared first on Data Science Central.  ( 4 min )
  • Open

    Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra
    Organizations use collaborative document authoring solutions like Salesforce Quip to embed real-time, collaborative documents inside Salesforce records. Quip is Salesforce’s productivity platform that transforms the way enterprises work together, delivering modern collaboration securely and simply across any device. A Quip repository captures invaluable organizational knowledge in the form of collaborative documents and workflows. However, finding […]  ( 6 min )

  • Open

    Do regulatory data projects really need design-time data lineage? Probably not.
    Your regulatory data project likely has no use case for design-time data lineage. tl/dr Mapping Data Lineage at design time, for its own end, has no regulatory use case or ROI.  Buying a specialist tool to support that mapping has even less ROI.  Regulations see that kind of documentary data lineage as ancillary at best.… Read More »Do regulatory data projects really need design-time data lineage? Probably not. The post Do regulatory data projects really need design-time data lineage? Probably not. appeared first on Data Science Central.  ( 10 min )
    Dark Energy, Dark Data
    During the 1990s, the physics community began to measure the brightness of certain supernovae in a novel way. This new method supported the conclusion Edwin Hubble had first arrived at in 1929 after discovering that galaxies are becoming more and more distant from us: Dark matter and dark energy play a role in why those… Read More »Dark Energy, Dark Data The post Dark Energy, Dark Data appeared first on Data Science Central.  ( 4 min )
    5 Main Benefits of Distributed Cloud Computing
    According to the predictions of Garter, by 2024, distributed cloud computing opportunities will be offered by most cloud vendors on a service basis. With the increasing rush in the cloud space and digitalization of documentation, this industry is bound to grow. Understanding Distributed Cloud Distributed cloud is an innovation to traditional cloud computing. It means… Read More »5 Main Benefits of Distributed Cloud Computing The post 5 Main Benefits of Distributed Cloud Computing appeared first on Data Science Central.  ( 5 min )
  • Open

    Speeding Up AI Algorithms- Inferencing challenges at the edge
    submitted by /u/Chipdoc [link] [comments]
    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    What if You Are a Prototype for the Ultimate Sentient Artificial Intelligence?
    submitted by /u/IndependenceFun4627 [link] [comments]  ( 1 min )
    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    There are so many crappy chatbots, cause people don't pay attention on how it's performing. If you're one of them, here are metrics to keep in mind
    Hi there! Chatbots are not the "set and forget" thing like many other software. If you want to achieve great results with your chatbot, you need to improve it constantly. To know where and what to improve, you need to track and monitor chatbot analytics and the main chatbot metrics. General chatbot metrics Total number of users User satisfaction Accuracy of the chatbot Engagement metrics Active users New users Conversation Length Retention Rate Bounce Rate Flow Completion Rate Conversational analytics Goal Completion Rate (GCR) Fallback Rate Human Takeover Rate * Bonus: Revenue metrics Revenue generated ROI / payback period Here in the article we covered how to calculate each metrics, and you can find needed metrics depending on the industry you working in https://botscrew.com/blog/chatbot-metrics/?utm_source=RedditArtificial&utm_medium=&utm_campaign=&utm_term=&utm_content= submitted by /u/Avandegraund [link] [comments]  ( 1 min )
    Wake-up Call for Science – AI System Develops 40,000 Chemical Weapons in 6 Hours
    submitted by /u/TheCnt23 [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]  ( 2 min )
    which courses are good for complete beginners?
    Hello everyone , can someone recommend me for some good courses to do , I saw some courses on udemy , this one worth it? https://www.udemy.com/course/artificial-intelligence-az/ or I can learn everything on youtube? there are few more on udemy but I don't know how good they are .. is it worth buying one of those or there are better videos on youtube? EDIT : I found another 4 courses : https://www.udemy.com/course/100-days-of-code/ https://www.udemy.com/course/complete-python-bootcamp/ https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/ https://www.udemy.com/course/machinelearning/ Which one of them would you recommend the most? submitted by /u/Edrixor [link] [comments]  ( 1 min )
    AI will make us dumb: [2204.07888] AI, Ageing and Brain-Work Productivity: Technological Change in Professional Japanese Chess
    submitted by /u/kg4jxt [link] [comments]  ( 1 min )
    Any good resources to learn Default Theory?
    I am having a difficult time understanding Default Theory and the various methods e.g Makinson to find the extension of default theories submitted by /u/cocag13996 [link] [comments]
    I know that the voice in this video is made using Replica Studio's engine, but does anyone know which voice exactly was used?
    This I looked through the available ones, not a single one seems to match it. Sorry if this isn't the right sub to ask, but since Replica Studios doesn't have its own sub I don't know where submitted by /u/AxySmarts [link] [comments]  ( 1 min )
    Artificial Nightmares: Schizophrenia || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    [D] Resources for Images Anomaly Detection
    Hello all, I know that there is a lot going on this field. I would like to get started on it, study more.. And as always, I like to start from the basis. Do you have any resource (video, article, book) good to star with? I know there are Autoencoders and Statistical models.. But how to know more, where/how do you keep studying? submitted by /u/bollolo [link] [comments]  ( 1 min )
    [R] Where can I find case studies on different ML projects?
    I am working on my research paper and would like to find resources which show the case studies of ML projects from the beginning to the end, doesn't matter if it failed or succeeded. submitted by /u/mkonu [link] [comments]
    [D] Create Labels for Data created by a GAN
    Hello there! I hope you have a great day! Currently I want to compare how good multiple GANs (Vanilla GAN, WGAN, DCGAN, ...) are for a given use case. Therefore I trained the various GAN versions with data of two different classes (i.e. apple and banana). Now I want to show that data I generate with the Generator can be used to train i.e. a classifier that can distinguish between real images of apples and bananas. Can I somehow create labels for the data I generate with the Generator in a smart way? So that I know that a generated image of the generator should for example be an apple? How do i do that? submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P] Luminide: new optimization Early Ranking achieves higher accuracy AI models
    Luminide introduces a new optimization, called Early Ranking, which makes it easier to build better AI models. Early Ranking achieves the same AI training results with up to 10x less compute – this saves time, reduces costs, and increases model accuracy. Luminide's IDE is a customized version of JupyterLab with integrated AI dev tools. Luminide used Early Ranking to place Top 1% in the CVPR Plant Pathology Kaggle competition. You can read about how we developed our winning model, and you can too, in our new blog post: Better Automation for Higher Accuracy AI Models. Class activation maps give insights into Luminide's winning model. Luminide is a new cloud platform for AI model development. Check out our demo video for a quick overview, or try it for yourself (sign up today and receive 100 hours of free GPU cloud compute). submitted by /u/LuminideInc [link] [comments]  ( 1 min )
    [D] generic discussion on freelance ML engineers
    Hi, reddit. Recently, I'm looking into freelancer career path. Currently, I'm a researcher at a top company. So far, I know there is toptal, upwork, and freelancers. Checked them out, and seems toptal you still end up working for large corporate and mostly end up as full-time contractor which is not really a different or better option than my current work. Freelancers has too many bidders from developing countries. Besides what platform to use, i have more questions in terms of what obstacles we are facing to be freelancer ML engineer? Even though i am in AI and a researcher, but i have never deployed a model in production. Usually a task at big company takes a team or multiple teams to complete the MLOPs lifecycle, how can you do it as a single person? Any sharing of experience would be of great help. submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [R] Looking for AI/ML experts from Southeast Asia to interview for master thesis
    Hello everyone, I am a student from Germany writing my master thesis on Digital Transformation in ASEAN with AI/ML. For my thesis I would like to interview AI/ML experts from the ASEAN region to talk about the digital development of each country, challenges and potentials. (If you are not native there, but you have a work connection or just knowledge about the region and its AI development, I appreciate that as well.) It would be awesome if some of you were open to talk to me. A few sentences are enough, I won't take much of your time. If you want, we can do a video call as well. I will quote you of course. Thank you guys. submitted by /u/BlueLagoon357 [link] [comments]  ( 1 min )
    Dealing with numerically 0 likelihood in probabilistic models [R]
    I'm trying to find literature on solving the following issue: In most probabilistic ML models, we model the joint distribution over a set of random variables, p(x1, ..., xN). If N is very large (e.g. 100, 500, or even 1000), then regardless of how you model this, the distribution's highest point of density is still quite tiny. E.g. if you consider an isotropic multivariate gaussian of 100 dimensions, the highest point of density will be somewhere in the neighbourhood of 1.6e-40. So when it comes time to evaluate log likelihood for a model like this, the probability is numerically 0, so the log probability goes to negative infinity. ​ Is there work around solving these kinds of issues? I.e. by constraining the model in some way, or scaling model output, etc? I've done some googling, but am having a hard time finding papers on the subject. Not even sure what to call the problem... Curse of dimensionality in PGMs? ​ Any recommendations of papers / talks / etc is greatly appreciated! submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [R][P] GAN-Control: Explicitly Controllable GANs + Gradio Web Demo
    ​ https://i.redd.it/v61jw1fekiu81.gif Abstract: We present a framework for training GANs with explicit control over generated facial images. We are able to control the generated image by settings exact attributes such as age, pose, expression, etc. Most approaches for manipulating GAN-generated images achieve partial control by leveraging the latent space disentanglement properties, obtained implicitly after standard GAN training. Such methods are able to change the relative intensity of certain attributes, but not explicitly set their values. Recently proposed methods, designed for explicit control over human faces, harness morphable 3D face models (3DMM) to allow fine-grained control capabilities in GANs. Unlike these methods, our control is not constrained to 3DMM parameters and is extendable beyond the domain of human faces. Using contrastive learning, we obtain GANs with an explicitly disentangled latent space. This disentanglement is utilized to train control-encoders mapping human-interpretable inputs to suitable latent vectors, thus allowing explicit control. In the domain of human faces we demonstrate control over identity, age, pose, expression, hair color and illumination. We also demonstrate control capabilities of our framework in the domains of painted portraits and dog image generation. We demonstrate that our approach achieves state-of-the-art performance both qualitatively and quantitatively. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Who funds the leading conferences in the field?
    I know that the publishers of the leading journals are mostly for-profit organization, that is weird because as researchers in the field we really “volunteer” for a free peer review or even pay to publish papers and read papers. On the other hand, i wasnt able to find information about the funding and profit goals of the leading conferences. Take NeuroIPS for example, i found that it is organized by “NeurIPS Foundation” but what exactly is this foundation - i couldn’t find any information about this subject. My point is, if the conferences are non-profit, sounds like they should be preferred over funding a for-profit organizations. submitted by /u/Careful_Winner_2335 [link] [comments]  ( 1 min )
    [D] NLP has HuggingFace, what does Computer Vision have?
    I've been writing tutorials with Pinferencia and HuggingFace. HuggingFace is quite handy and easy to use. I want to write some tutorial about computer vision afterwards. Is there anything similar in Computer vision area? submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [D] Why no paper in Speech Emotion Recognition train on multiple datasets ?
    I took a look at multiple of them and I was curious why they seemed to benchmark on multiple datasets but for the training, they restrained themselves to only 1 for training instead of merging them. From that they get good scores on the one they trained on, but bad ones for the rest. submitted by /u/raysamram [link] [comments]  ( 1 min )
    [P] SparseServer.UI : A UI to test performance of Sparse Transformers
    You can now load multiple transformers (each model has a unique sparsification recipe) on top of the DeepSparse server behind Streamlit, and it's open-source. This was battle tested on a 16GB of RAM with only 4 core CPU virtual machine. These compute requirements are enough to load up to 19 sparse BERT models in memory and compare their performance on question answering (P.S. they are really fast on just CPUs). 💻code: https://github.com/neuralmagic/deepsparse/tree/main/examples/sparseserver-ui submitted by /u/Quantum_Stat [link] [comments]  ( 1 min )
    [Research] Learning with Signatures
    This paper reports "results on AFHQ dataset, Four Shapes, MNIST and CIFAR10 achieving 100% accuracy on all tasks." The authors used few-shot classification "by comparing each test sample (after optional augmentation and computation of the element-wise mean) against a representative element-wise mean signature computed by averaging the signatures of a given number of train samples." What are your thoughts on this? Learning with Signatures - https://arxiv.org/abs/2204.07953 submitted by /u/Marmadelov [link] [comments]  ( 1 min )
    [Project] [Research] Simple Speech Recognition System
    Github - Bangla Spoken Number Recognition Dataset - Our custom dataset on Bangla Numerals Publications - Though its on (0-9) digits We have created a simple speech recognition system for recognizing Bangla numerals from '০-৯৯'(0-99). In this project, audio samples from different genders, age groups, and dialects of Bangladeshi people were used to create a speech dataset of spoken numbers from '০-৯৯'(0-99). The raw speech data is subjected to various audio augmentation techniques such as time shift, speed tuning, background noise mixing, and volume tuning. Then, to extract meaningful features from the data, Mel Frequency Cepstrum Coefficients (MFCCs) are used. We have used, Convolutional Neural Networks (CNNs), to develop a Bangla number recognition system. The proposed method recognizes '০-৯৯'(0-99) Bangla spoken numbers with 89.61% accuracy across the entire dataset. The model’s effectiveness was also tested using 10-fold cross-validation, with 89.74% accuracy for recognizing '০-৯৯'(0-99) Bangla spoken numbers across the entire dataset. I Hope, this work will help you in some way. :) submitted by /u/PIASR0Y [link] [comments]  ( 1 min )
    [P] Improving mulitclass classification accuracy with Jain's Fairness Index
    This is a light implementation of the idea in the paper Leveraging Uncertainties in Softmax Decision-Making Models for Low-Power IoT Devices. Instead of finding uncertainties I have added Jain's Fairness Index as a addition to the loss function. Gist: https://gist.github.com/Gananath/8d167384da7d3bc078650c73fab1a8dd submitted by /u/gananath [link] [comments]
    [D] Are workshop papers considered "final publications"?
    Specifically, I'm talking about workshops of major conferences (NeurIPS, ICLR, ICML, etc.). If I submit a paper and it gets accepted, is that workshop paper a "final publication"? Or would most people expect the project to continue being developed into a slightly larger/longer paper for submission to the main stream of a conference? And if so, does publishing the earlier workshop paper tend to hinder or harm the later conference submission? I recognise there's a variety of workshops, and perhaps each have different expectations or norms. I'm wondering, from my outsider's perspective, how can I tell? For example, I have been thinking about submitting to one of these ICML workshops: https://icml-compbio.github.io/ or https://www.tagds.com/workshops/tag-in-machine-learning. Is there an easy way to tell whether either or both of these are "final publication" venues or not? submitted by /u/tfburns [link] [comments]  ( 2 min )
    [R] Maximum likelihood estimation can fail due to "Manifold Overfitting"
    arXiv: https://arxiv.org/abs/2204.07172 This paper out today seems to make the bold claim that maximum likelihood estimation is not a well-posed training objective in deep generative modelling. The manifold hypothesis says that observed high-dimensional data clusters around low-dimensional manifolds, but maximum likelihood methods (e.g. VAE, normalizing flows) learn high-dimensional densities. The paper argues that the mismatch between dimensionalities will lead to a problem called "manifold overfitting". Models are able to maximize likelihood in high-dimensions by sending the density to infinity around the low-dimensional manifold, but they can do this while completely ignoring the distribution of data on the manifold. So in other words, high capacity models will learn the data manifold…  ( 5 min )
  • Open

    "Reinforcement Learning with Action-Free Pre-Training from Videos", Seo et al 2022
    submitted by /u/gwern [link] [comments]
    "Inferring Rewards from Language in Context", Lin et al 202
    submitted by /u/gwern [link] [comments]
    Bandit problems as sequential decision problems
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process (need to model the state carefully). An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book (https://castlelab.princeton.edu/RLSO/) for a complete summary of the four classes of policies for pure learning problems (aka bandit problems). Note that Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Getting started with UAV/drone control
    Hi, is it currently possible to train a UAV and implement the policy it in real-life? I understand there are different environments for training, e.g. AirSim, GymFC, and others. However, the interesting part for me is the link to the real world: Is there a way to directly implement any learned policy on a real drone, e.g. a commercially available quad-copter? Which UAV would support such a functionality? I'd love to get started on training drones for RL purposes (search and rescue, etc), but if there is no way to test it in real-life then this would be disappointing. submitted by /u/FrankTheThanks [link] [comments]  ( 1 min )
    Question about Expected Sarsa for prediction vs control
    I am having a hard time figuring out what makes the difference between Expected Sarsa for prediction vs for control. For off-policy Expected Sarsa I believe it's possible to use one epsilon value for a target policy that is epsilon-greedy and another epsilon value for a behaviour policy that is epsilon-greedy. The target policy would be used within the expected value calculation in the update of Q(S,A), the action value function, and the behaviour policy would be used to choose actions from the current state. But I'm not sure how to differentiate between the control version of the algorithm compared to the prediction version though. I think prediction usually finds the state-value function but I know that on-line Sarsa for prediction uses Q(S,A) so I'm not sure how to determine the difference between prediction and control algorithms. submitted by /u/lifelifebalance [link] [comments]  ( 1 min )
    exploration strategies in discrete action spaces
    Hello there, I am working on missile command game, and as a baseline I mostly use rllib/ppo. The algorithm never converges, I suspect it is because of the lack of exploration. Since the timesteps are small, the target usually oscillates around center of the screen, it is impossible to explore to go near the border and then explore to fire (to counter incoming missile). What methods should I try? Moreover, I have already done reward scaling and frame staking. Any suggestions regarding solving this game is much appreciated. Last question, do you now similar (and common) environments that is solved, maybe solutions show the path to follow.Thank you :) submitted by /u/Street_Excitement_14 [link] [comments]  ( 1 min )
    Confusion of hyperparameters in ppo
    I'm reading the ppo paper https://arxiv.org/abs/1707.06347 and I'm confusing about the hyperparameters in table 4, Log stdev. of action distribution | LinearAnneal(-0.7, -1.6). Best to my knowledge, under the continuous setting, the policy will output mean and std, so why the stdev of action distribution is given as a hyperparameter, and also what is LinearAnneal in detail. submitted by /u/StrawberryTemporary7 [link] [comments]  ( 1 min )
    Need help about categorical dqn
    I dk how the projection of TZ to match Z work and I also dont understand the formula? can someone do step by step calculation to demo? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
  • Open

    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    This is a long shot, but does anyone remember...
    Hi, this is a very long shot. I have been trying to remember the name of a science TV show which aired in the UK back in the 90's. It focussed on Neural Networks and gave some brilliant examples of environmental sensing. There was also a section showing a simple voice synthesiser which "babbled" like a child. I thought it may have been an "Horizon" show, however, I have been through the list of shows from that time and none appear to be right. If anyone has a memory of this show please let me know. One of the visuals I remember was a plastic skull with an LED matrix inside showing patterns. Obviously this was just some smoke and mirrors, however, it may trigger a memory. I'm trying to recall something from best part of 30 years ago.. submitted by /u/_m0xya_ [link] [comments]  ( 1 min )
  • Open

    10 seats remaining | A series of live ML strategy workshops
    Sponsored Post Unlike traditional online courses, Foster Provost’s workshops will give you the chance to engage live with a world-class […] The post 10 seats remaining | A series of live ML strategy workshops appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Learning to Prompt for Continual Learning
    Posted by Zifeng Wang, Student Researcher, and Zizhao Zhang, Software Engineer, Google Research Supervised learning is a common approach to machine learning (ML) in which the model is trained using data that is labeled appropriately for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) data, where all training examples are sampled from a fixed set of classes, and the model has access to these examples throughout the entire training phase. In contrast, continual learning tackles the problem of training a single model on changing data distributions where different classification tasks are presented sequentially. This is particularly important, for example, to enable autonomous agents to process and interpret continuous streams of informati…  ( 7 min )
  • Open

    Integrate ServiceNow with Amazon Lex chatbot for ticket processing
    Conversational interfaces (or chatbots) can provide an intuitive interface for processes such as creating and monitoring tickets. Let’s consider a situation in which a recent hire on your team is required to cut tickets for office equipment. To do so, they have to interact with a ticketing software that the organization uses. This often requires […]  ( 10 min )
  • Open

    Inversion in a circle
    Inversion in the unit circle is a way of turning the circle inside-out. Everything that was inside the circle goes outside the circle, and everything that was outside the circle comes in. Not only is the disk turned inside-out, the same thing happens along each ray going out from the origin. Points on that ray […] Inversion in a circle first appeared on John D. Cook.  ( 2 min )
  • Open

    Don’t let data drift derail edge compute machine learning models
    Edge computing has come of age, with deployments enabling many applications that process data from IoT sensors and cameras. In 2017, we identified the symbiotic relationship between edge computing and video analytics in an article, noting that live video analytics is the “killer app” for edge computing. Edge devices come in various shapes and sizes […] The post Don’t let data drift derail edge compute machine learning models appeared first on Microsoft Research.  ( 5 min )
  • Open

    A Quick Guide To Find The Right Minds For Annotation Is So Famous, But Why?
    Shared duties have always been the most critical component of every successful organization, regardless of its nature or size. When it…  ( 4 min )
  • Open

    Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques
    Creating content is no longer tethered to using paint and stone as mediums, nor being in massive studios. Visual art can now be created anywhere, anytime. But being creative is still challenging and time-consuming. NVIDIA is making artistic workflows easier and faster by giving creators tools that enable them to remain in their flow state. Read article > The post Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques appeared first on NVIDIA Blog.  ( 4 min )

  • Open

    whats your hopes and worry about future humaniod Artificial intelligence coming soon?
    submitted by /u/Upset_Force66 [link] [comments]
    AI Dream 36 - Psychedelic Special (4K 40Mbit Test)
    submitted by /u/LordPewPew777 [link] [comments]
    AI Startups and the Hunt for Tech Talent in Vietnam
    submitted by /u/regalalgorithm [link] [comments]
    We don't have echolocation
    submitted by /u/tezdhar [link] [comments]
    These 3-Michelin-starred plates were invented by AI. The food doesn’t even exist
    submitted by /u/jonfla [link] [comments]
    Getting in shape while homeworking by force locking the screen and using blazepose pose estimation to detect pushups to unlock it again.
    submitted by /u/ThePyCoder [link] [comments]
    Last Week in AI: AI chip startup funding doubled in the last 5 years, new AI applications in hospitals and restaurants, Cruise robotaxi pulled over by police in SF, and more!
    https://lastweekin.ai/p/163?s=w submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Youtubers create a completely AI "influencer."
    submitted by /u/savetheattack [link] [comments]
    FOMO is a TinyML neural network for real-time object detection
    submitted by /u/bendee983 [link] [comments]
    An online course with an AI tutor achieves a significantly higher completion rate than traditional online courses thanks to a personalized learning experience.
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Protein Folding Neural Networks (e.g RoseTTAFold) Are Not Robust
    submitted by /u/qptbook [link] [comments]
    Witch of the Barthe
    submitted by /u/Hacknaut [link] [comments]
    Why is it called tensorflow and not matrixflow?
    Hello, I'm MB. A very nice and polite guy. Why is it called tensorflow and not matrixflow? AI is all about matrix multiplications, right? So why use the word tensor instead? I know what a tensor is, kind of. But isn't AI about matrix multiplications primarily rather than tensor multiplications. ELI5 please. submitted by /u/MountBlanc [link] [comments]  ( 2 min )
    Society
    submitted by /u/booksmoothie [link] [comments]
    Bioinspired multisensory neural network with crossmodal integration and recognition
    submitted by /u/booksmoothie [link] [comments]
    Realistic animal movement
    I am working on a robotic pet that has lots of movement capability but is simply scripted and will unnaturally jump between movement sets without considering the current movement. What branch of AI should I look into leaning about? Currently I use mostly python for high level and C for microcontrollers. submitted by /u/uMinded [link] [comments]
  • Open

    A3C vs federated learning?
    Hi, I see this question was asked before but I am still not convinced there is a difference between the two. How is asynchronous distributed RL (A3C) and federated learning different? It seems like the basic idea behind them is the same— the agents train in their own environments and only share gradients with the server. Is the difference only in terms of the domain they are applied in? Is it just ML vs RL? submitted by /u/uneasy_daisy [link] [comments]  ( 1 min )
    Can polyak averaging neural networks lead to numerical instability?
    In Soft Actor Critic several Q networks are used. Target Q networks are gradually updated to match other Q networks. See step 15 here: https://spinningup.openai.com/en/latest/algorithms/sac.html#pseudocode I've heard this called polyak averaging. Let's say we have two weights from two neural networks: W1 from one network, and W2 is the corresponding weight from the other network. Polyak averaging averages these weights as follows: W_average = W1 * p + W2 * (1-p) When p is 0.5, it's a evenly weighted average. If p is high, then W1 is weighted more heavily than W2, etc. My question is: Does this method of averaging weights lead to numerically unstable neural networks? This technique is often used to gradually transform one neural network into another on a weight by weight basis, but there is no guarantee that all intermediate neural networks are well behaved (at least, none that I'm aware of). Whereas, gradient descent with small enough step sizes should, theoretically, keep a neural network well behaved, I think those same theoretical guarantees apply to polyak averaging neural networks. What do you think? submitted by /u/Buttons840 [link] [comments]  ( 2 min )
  • Open

    [D] Word Meaning Dictionary Dataset
    Hey all! So I intend to make an application that, very naively speaking, outputs synonyms of a given word regardless of context (like if word1 is "bank", the model should output both "money" and "river", and the order does not matter). For this, I intend to use a Doc2Vec type of classifier, where the meanings of each word can serve as a document, and then similar words can easily be returned using a cosine similarity function. I chose this over a classic Word2Vec as this will be able to predict uncommon words (which blimey the English language has a lot of) which would otherwise be processed as tokens. To this end, I am searching for a suitable dataset. Any ideas? submitted by /u/GrammarPaparazzi [link] [comments]  ( 1 min )
    [D] Is there a way to use a series of videos as the predictor variable for prediction/regression?
    This is the problem area I am working with: I have a series of videos taken at different times, and each video is paired with a physical variable. The videos contain information that correlates with the physical variable. What we want to do is use the information encoded within each video to build a correlation model with the physical quantity, and thereafter use new videos to predict the physical quantity. (We want to avoid the route of video -> CNN -> extract parameters -> build model with parameters. Instead, we want to directly go from the videos to the model without separately extracting parameters.) So, in a way, I want to use a series of videos as a time series data set. Is there a way to do this? What should be the starting point for my research into this? Thanks in advance! I am not an expert with this area at all, and would greatly appreciate guidance from the community. submitted by /u/besse [link] [comments]  ( 1 min )
    [P] Blog post + open-source PyTorch implementation of DeepMind's SIMONe (unsupervised scene decomposition)
    Hi all! My team recently reproduced and published a PyTorch implementation of the paper SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition. Our blog post walks through the code and provides a detailed explanation of the architecture they use in order to perform object segmentation on videos in a fully self-supervised manner. Hope this is helpful/interesting to others! submitted by /u/ai_ellie [link] [comments]  ( 1 min )
    [D] AutoRF vs SinNeRF
    Both approaches seem to be able to render complex scenes from a single view, without the need for explicit priors or pretrained feature extractors. Conveniently, AutoRF doesn't mention SinNeRF. What are the similarities and differences among the two approaches? DISCLAIMER - I'm not a NeRF expert. My limited understanding of it is that we train a small MLP to regress the radiance field for a scene, i.e., to predict emitted radiance at a point (x,y,z) in the viewing direction (θ, φ). Once we have the radiance field, we can use some rendering engine to render a 2D view from the 3D field and the camera parameters. EDIT: I just realized that I didn't link the papers, how silly of me. Here they are: SinNeRF: https://arxiv.org/abs/2204.00928 AutoRF: https://arxiv.org/abs/2204.03593 ​ ​ submitted by /u/Best-Neat-9439 [link] [comments]  ( 1 min )
    [P] Evaluating automatic paraphrasing via BLEU, LaBSE, Perplexity and Jaccard similarity index - how we do it for Linguix Paraphraser 2.0
    Hey everyone! Our NLP team, led by our expert Daria, has recently released a new AI-based paraphrasing feature – Linguix Paraphraser 2.0. To measure its quality, we use four important metrics: BLEU, Jaccard similarity index, LaBSE and Perplexity. Performance stats: BLEU, which is used for measuring the quality of machine translation. The lower it is for rephrase task, the better. Right now, Linguix Paraphraser 2.0 has the BLEU metric of 0.47 (previous iteration had 0.65). So, we can say that our paraphraser is now smarter, it uses more words to rewrite the sentence, but the overall idea of the content is still preserved. Jaccard similarity index is used to measure the likeness of x and y objects. The same as with BLEU, the lower the index for the task, the better. Our current metric is 0.45 compared to 0.51 for the previous iteration. LaBSE metric is used to measure the semantic similarity of two sentences. It translates text into vectors so that vectors of texts close in meaning are geometrically close to each other. The higher the metric, the better. The new model has LaBSE similarity slightly less than the previous model: 0.80 vs 0.93, which is normal and correct, because the model generates a variety of variants using other words, but keeping the meaning of the source text in the target. Perplexity is used to ensure the rewritten content sounds natural (lower perplexity is better). The naturalness of the rewrites generated by our new paraphraser is much better than before: 0.26 vs 4.99 for the prior version. ​ https://i.redd.it/iaaf7o2iibu81.gif As such, for Linguix Paraphraser 2.0 we were able to improve the quality of the rephrased content, while keeping the text meaning at the same level. P.S. Daria is somewhat shy, so I asked her to share the update here on her behalf. Anyway she'll be pleased to see some feedback! submitted by /u/alexlash [link] [comments]  ( 1 min )
    [R] [P] Slideflow: a deep learning framework for digital histology
    Hi all - I'm an applied ML researcher working in an oncology research lab at U Chicago, using digital slides of patient's tumors for tumor classification, prognostication, and treatment response prediction. I'm really excited to share with the community the deep learning tools we've been using, and I'm hoping for any feedback you might have (or direction if you think there's a community or subreddit this might be better suited for). After years of development, we've released our open-source deep learning framework for digital histology, Slideflow (https://github.com/jamesdolezal/slideflow). It has flexible and highly optimized whole-slide image processing, support for a wide variety of existing and custom architectures (with continuous, categorical, or time-series outcomes), real-time digital stain normalization, a number of explainability tools, and integrated uncertainty quantification. It's compatible with both Tensorflow and PyTorch, available on PyPI and DockerHub, and comes with good documentation (https://slideflow.dev/). We've tried out a number of alternative frameworks over the years, and I think the ease of use, flexibility, and performance optimizations set it apart from other repos you'll find on GitHub. We have a handful of local collaborators who are using Slideflow, but I'm hoping to expand our reach and find people in similar fields who are interested in collaborating for ongoing open-source development. I've tried looked for subs relating specifically to computational pathology / digital histology, and haven't found a good community yet - anyone have ideas for how to get connected with like-minded people working in the same field? submitted by /u/shawarma_bees [link] [comments]  ( 2 min )
    [D] Anyone using named tensors or a tensor annotation lib productively?
    It seems like there have been some options out for a while now - e.g. native pytorch named tensors, tsalib, torchtyping - yet I haven't really seen them discussed or used in any code I've come across. Just wondering if anyone has surveyed them recently and is using them. In particular tsalib's warp string syntax for transformations looks really interesting. submitted by /u/patniemeyer [link] [comments]  ( 1 min )
    [D] Are there any analog A.I. computing chips on the retail market yet?
    If so, where to buy them? (for example: I red that mythic has been collecting funding in mid-2021, but I dont know if they are for sale anywhere). submitted by /u/GerritTheBerrit [link] [comments]  ( 2 min )
    [N][R][P] High fidelity 3D face reconstruction from monocular image
    FaceNext is an open source PyTorch library for high fidelity 3D face reconstruction from single/multiple RGB image(s). github.com/abdallahdib/NextFace ​ https://reddit.com/link/u6e7cd/video/ixg0wlzirau81/player submitted by /u/Abd_dib [link] [comments]
    [D] Which keywords describe my task?
    Hey all, I have received a task in an area I am unfamiliar with and need a little help finding suitable papers, so I am looking for keywords. To illustrate the goal, let's say you have 10000 screws (which can be of the same model) and you want to be able to recognize/match each one. You want new screws to be added all the time, so you also want the case that the object could previously be unknown when performing the match. The goal is to develop a capturing system that produces suitable images and to find an architecture/algorithm that is as robust as possible. The object images should be invariant to illumination, rotation and translation during acquisition. It should be a kind of barcode/hash without any additional symbol, based only on the structure of the object. Is there a name for such a task? I think it is not really a classification in the classical sense. I guess it might be just a clever way of finding suitable features for each individual object structure and suitable distance function. Sorry for the long post, I appreciate any help. submitted by /u/Temporary_Lab769 [link] [comments]  ( 1 min )
    [D] Including outer objects in RNN / CNN
    Hello there, Which layer or structure would you append to existing machine learning architectures like yolov5 in order to not only detect the specific object, but also the object which it is part of? Lets say there are xray images of laptops: The laptop itself will be detected and also something like the hard drive or battery inside of it. Is it possible to make the CNN/RNN aware of the fact that the hard drive or battery is inside the Laptop? Hope someone can tell what i mean. Regards David submitted by /u/rohrivibes [link] [comments]  ( 1 min )
    [D] PhD in knowledge representation and reasoning for autonomous agent: research landscape
    I have been offered a PhD in domain of knowledge representation and reasoning for autonomous agents. Goal is to use represent textual rules and world knowledge and then use those represented knowledge for reasoning, so that motion of autonomous agent can be predicted. I have question regarding the current landscape of knowledge representation and reasoning. I see more and more work in data focused model and old Logic and associated paths fading out. Phd project problem itself looks interesting as it focus on work where there will be less need of data and can plan motion in unseen scenarios. But I am concerned about the future career prospective in this domain where this problem is tackled by knowledge representation and reasoning. As I can see there is less and less funding in this domain. What is your take on future landscape of research direction in this domain? submitted by /u/human_treadstone [link] [comments]  ( 2 min )
    [P] My blog on ML model evaluation (Bayes optimal decisions, ROC curve, LLR calibration)
    I have published 3 articles about ML model evaluation on my personal blog. Just finished the 3 installment, so I am keen to share and get some feedback. I cover frameworks traditionally used in ML like ROC curves, but from a Bayes decision perspective, which I have been struggling to find in textbooks/tutorials. The 3rd part is about the evaluation of log-likelihood calibrated models. Hope you will find it interesting/useful! https://mkffl.github.io/2021/10/18/Decisions-Part-1.html https://mkffl.github.io/2021/10/28/Decisions-Part-2.html https://mkffl.github.io/2022/03/02/Decisions-Part-3.html And the underlying code for reproducibility https://github.com/mkffl/decisions submitted by /u/mkffl [link] [comments]  ( 1 min )
    [P] ormb: Docker for Your Models, Help You Manage Models Better
    github.com/kleveross/ormb ormb helps you manage your Machine Learning/Deep Learning models with docker container image registry. It makes your models easy to create, version, share and publish. ``` Save the model in local cache first $ ormb save gaocegege/fashion_model:v1 ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: saved Push the model from local cache to remote registry $ ormb push gaocegege/fashion_model:v1 The push refers to repository [gaocegege/fashion_model] ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: pushed to remote (1 layer, 162.1 KiB total) Pull the model from remote registry to local cache $ ormb pull gaocegege/fashion_model:v1 v1: Pulling from gaocegege/fashion_model ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB Status: Downloaded newer model for gaocegege/fashion_model:v1 Export the model from local cache to current directory $ ormb export gaocegege/fashion_model:v1 ref: localhost/gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB View the local file directory $ tree examples/SavedModel-fashion examples/SavedModel-fashion ├── model │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index ├── ormbfile.yaml └── training-serving.ipynb 2 directories, 5 files ``` submitted by /u/gaocegege [link] [comments]  ( 1 min )
    [D] Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection
    Hi, I have just published my latest medium article. Anomalies are widespread when it comes to working on data. They become vital in time series. So, It is crucial to propose efficient methods to detect and deal with them. This article illustrates a state-of-the-art model called DGHL for anomaly detection. DGHL includes a ConvNet as a Generator and instead of encoding it maximizes the likelihood with the Alternating Back-Propagation algorithms. https://rezayazdanfar.medium.com/deep-generative-model-with-hierarchical-latent-factors-for-time-series-anomaly-detection-8d6eaebad8bc submitted by /u/rezayazdanfar [link] [comments]  ( 1 min )
    [P] app to play with latent diffusion models
    just published “geni”, a new minimal app that uses Latent Diffusion Models. It will not produce DALL-E-ish results but it’s fast and great for playing with prompt engineering. Also, it’s free. would love to have the community playing with it. check it out here: https://geni.vercel.app submitted by /u/viccpopa [link] [comments]  ( 1 min )
    [R] VQ-Flows: Vector Quantized Local Normalizing Flows
    arXiV: https://arxiv.org/abs/2203.11556   Summary: We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.​   GitHub: Coming Soon Author here, happy to answer any questions. submitted by /u/tshrjn [link] [comments]  ( 1 min )
  • Open

    Does anyone have a guess as to why my network isn’t working? (more info in comments)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    Sponsored Post By Luis Bermudez This blog walks through a process for experimenting with hyperparameters, training algorithms and other parameters […] The post Guide to Iteratively Tuning GNNs appeared first on Machine Learning Mastery.  ( 6 min )
    Managing Data for Machine Learning Project
    Big data, labeled data, noisy data. Machine learning projects all need to look at data. Data is a critical aspect […] The post Managing Data for Machine Learning Project appeared first on Machine Learning Mastery.  ( 30 min )
  • Open

    7 Tips for making your code more ‘pythonic’ and elegant
    7 use-cases where you can make your python code more nifty, concise and elegant — without compromising readability. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    AI and Healthcare: AI as a Triaging Tool for Healthcare
    Healthcare offers one of the biggest areas where AI could impact people. AI in healthcare is already widespread but is expected to grow even further. The global artificial intelligence in healthcare market size was valued at USD 10.4 billion in 2021. It is expected to expand at a compound annual growth rate (CAGR) of 38.4%… Read More »AI and Healthcare: AI as a Triaging Tool for Healthcare The post AI and Healthcare: AI as a Triaging Tool for Healthcare appeared first on Data Science Central.  ( 3 min )
    Fallacy of Becoming Data-driven – Part 2: Cultural Transformation
    In my first blog of the series “Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed”, I preached about the critical importance of reframing the conversion away from data-driven to becoming value-obsessed. Instead of focusing on becoming value-driven, organizations need to focus on how to uncover the customer, product, service, and operational insights buried in… Read More »Fallacy of Becoming Data-driven – Part 2: Cultural Transformation The post Fallacy of Becoming Data-driven – Part 2: Cultural Transformation appeared first on Data Science Central.  ( 5 min )
    A Glossary of Knowledge Graph Terms
    As with many fields, knowledge graphs boast a wide array of specialized terms. This guide provides a handy reference to these concepts. Resource Description Framework (RDF) The Resource Description Framework (or RDF) is a conceptual framework established in the early 2000s by the World Wide Web Consortium for describing sets of interrelated assertions. RDF breaks… Read More »A Glossary of Knowledge Graph Terms The post A Glossary of Knowledge Graph Terms appeared first on Data Science Central.  ( 10 min )
  • Open

    Auto-Gait: Automatic Ataxia Risk Assessment with Computer Vision on Gait Task Videos. (arXiv:2203.08215v2 [cs.CV] UPDATED)
    In this paper, we investigated whether we can 1) detect participants with ataxia-specific gait characteristics (risk-prediction), and 2) assess severity of ataxia from gait (severity-assessment) using computer vision. We created a dataset of 155 videos from 89 participants, 24 controls and 65 diagnosed with (or are pre-manifest) spinocerebellar ataxias (SCAs), performing the gait task of the Scale for the Assessment and Rating of Ataxia (SARA) from 11 medical sites located in 8 different states across the United States. We develop a computer vision pipeline to detect, track, and separate out the participants from their surroundings and construct several features from their body pose coordinates to capture gait characteristics like step width, step length, swing, stability, speed, etc. Our risk-prediction model achieves 83.06% accuracy and an 80.23% F1 score. Similarly, our severity-assessment model achieves a mean absolute error (MAE) score of 0.6225 and a Pearson's correlation coefficient score of 0.7268. Our models still performed competitively when evaluated on data from sites not used during training. Furthermore, through feature importance analysis, we found that our models associate wider steps, decreased walking speed, and increased instability with greater ataxia severity, which is consistent with previously established clinical knowledge. Our models create possibilities for remote ataxia assessment in non-clinical settings in the future, which could significantly improve accessibility of ataxia care. Furthermore, our underlying dataset was assembled from a geographically diverse cohort, highlighting its potential to further increase equity. The code used in this study is open to the public, and the anonymized body pose landmark dataset is also available upon request.
    Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer. (arXiv:2204.07537v1 [cs.CV])
    Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.  ( 2 min )
    Effects of Multi-Aspect Online Reviews with Unobserved Confounders: Estimation and Implication. (arXiv:2110.01746v2 [cs.LG] UPDATED)
    Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely consider the effects of single numerical causes and ignore different effects of multiple aspects -- e.g., Food, Service -- embedded in the textual reviews; they assume the absence of hidden confounders in observational studies, e.g., consumers' personal preferences; and they overlook the indirect effects of numerical causes that can potentially cancel out the effect of textual reviews on business revenue. We thereby propose an alternative perspective to this single-cause-based effect estimation of online reviews: in the presence of hidden confounders, we consider multi-aspect textual reviews, particularly, their total effects on business revenue and direct effects with the numerical cause -- ratings -- being the mediator. We draw on recent advances in machine learning and causal inference to together estimate the hidden confounders and causal effects. We present empirical evaluations using real-world examples to discuss the importance and implications of differentiating the multi-aspect effects in strategizing business operations.  ( 2 min )
    Multi-domain Integrative Swin Transformer network for Sparse-View Tomographic Reconstruction. (arXiv:2111.14831v7 [eess.IV] UPDATED)
    Decreasing projection views to lower X-ray radiation dose usually leads to severe streak artifacts. To improve image quality from sparse-view data, a Multi-domain Integrative Swin Transformer network (MIST-net) was developed in this article. First, MIST-net incorporated lavish domain features from data, residual-data, image, and residual-image using flexible network architectures, where residual-data and residual-image sub-network was considered as data consistency module to eliminate interpolation and reconstruction errors. Second, a trainable edge enhancement filter was incorporated to detect and protect image edges. Third, a high-quality reconstruction Swin transformer (i.e., Recformer) was designed to capture image global features. The experiment results on numerical and real cardiac clinical datasets with 48-views demonstrated that our proposed MIST-net provided better image quality with more small features and sharp edges than other competitors.
    GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning. (arXiv:2111.11210v3 [cs.LG] UPDATED)
    Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a "coreset" that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy.
    Learning to Accelerate by the Methods of Step-size Planning. (arXiv:2204.01705v3 [cs.LG] UPDATED)
    Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.  ( 2 min )
    Transfer Learning for Instance Segmentation of Waste Bottles using Mask R-CNN Algorithm. (arXiv:2204.07437v1 [cs.CV])
    This paper proposes a methodological approach with a transfer learning scheme for plastic waste bottle detection and instance segmentation using the \textit{mask region proposal convolutional neural network} (Mask R-CNN). Plastic bottles constitute one of the major pollutants posing a serious threat to the environment both in oceans and on land. The automated identification and segregation of bottles can facilitate plastic waste recycling. We prepare a custom-made dataset of 192 bottle images with pixel-by pixel-polygon annotation for the automatic segmentation task. The proposed transfer learning scheme makes use of a Mask R-CNN model pre-trained on the Microsoft COCO dataset. We present a comprehensive scheme for fine-tuning the base pre-trained Mask-RCNN model on our custom dataset. Our final fine-tuned model has achieved 59.4 \textit{mean average precision} (mAP), which corresponds to the MS COCO metric. The results indicate a promising application of deep learning for detecting waste bottles.  ( 2 min )
    Big-means: Less is More for K-means Clustering. (arXiv:2204.07485v1 [cs.LG])
    K-means clustering plays a vital role in data mining. However, its performance drastically drops when applied to huge amounts of data. We propose a new heuristic that is built on the basis of regular K-means for faster and more accurate big data clustering using the "less is more" and MSSC decomposition approaches. The main advantage of the proposed algorithm is that it naturally turns the K-means local search into global one through the process of decomposition of the MSSC problem. On one hand, decomposition of the MSSC problem into smaller subproblems reduces the computational complexity and allows for their parallel processing. On the other hand, the MSSC decomposition provides a new method for the natural data-driven shaking of the incumbent solution while introducing a new neighborhood structure for the solution of the MSSC problem. This leads to a new heuristic that improves K-means in big data conditions. The scalability of the algorithm to big data can be easily adjusted by choosing the appropriate number of subproblems and their size. The proposed algorithm is both scalable and accurate. In our experiments it outperforms all recent state-of-the-art algorithms for the MSSC in terms of time as well as the solution quality.  ( 2 min )
    Towards PAC Multi-Object Detection and Tracking. (arXiv:2204.07482v1 [cs.CV])
    Accurately detecting and tracking multi-objects is important for safety-critical applications such as autonomous navigation. However, it remains challenging to provide guarantees on the performance of state-of-the-art techniques based on deep learning. We consider a strategy known as conformal prediction, which predicts sets of labels instead of a single label; in the classification and regression settings, these algorithms can guarantee that the true label lies within the prediction set with high probability. Building on these ideas, we propose multi-object detection and tracking algorithms that come with probably approximately correct (PAC) guarantees. They do so by constructing both a prediction set around each object detection as well as around the set of edge transitions; given an object, the detection prediction set contains its true bounding box with high probability, and the edge prediction set contains its true transition across frames with high probability. We empirically demonstrate that our method can detect and track objects with PAC guarantees on the COCO and MOT-17 datasets.
    A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow. (arXiv:2110.11991v2 [eess.SY] UPDATED)
    With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
    Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs. (arXiv:2111.06483v3 [cs.LG] UPDATED)
    We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.We made the SAR GNN training library publicy available: \url{https://github.com/IntelLabs/SAR}.
    Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records. (arXiv:2203.06918v2 [cs.CL] UPDATED)
    Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.
    The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants. (arXiv:2204.07431v1 [cs.NE])
    Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances. Existing approaches typically treat the algorithms as black-boxes, without consideration of their characteristics. To investigate in this work if a selection of landscape features that depends on algorithms properties could further improve regression accuracy, we regard the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models. Exploratory data analysis performed on this data indicate that the set of most relevant features does not depend on the configuration of individual modules, but the influence that these features have on regression accuracy does. In addition, we have shown that by using classifiers that take the features relevance on the model accuracy, we are able to predict the status of individual modules in the CMA-ES configurations.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    On the Importance of Firth Bias Reduction in Few-Shot Classification. (arXiv:2110.02529v2 [cs.CV] UPDATED)
    Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction
    Efficient Architecture Search for Diverse Tasks. (arXiv:2204.07554v1 [cs.LG])
    While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
    CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection. (arXiv:2204.07543v1 [cs.LG])
    Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data in a limited time frame. We formulate the cryo-EM data collection task as an optimization problem in this work. The goal is to maximize the total number of good images taken within a specified period. We show that reinforcement learning offers an effective way to plan cryo-EM data collection, successfully navigating heterogenous cryo-EM grids. The approach we developed, cryoRL, demonstrates better performance than average users for data collection under similar settings.
    Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters. (arXiv:2204.07447v1 [cs.CL])
    Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.  ( 2 min )
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Deep learning model solves change point detection for multiple change types. (arXiv:2204.07403v1 [cs.LG])
    A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.  ( 2 min )
    Characterizing metastable states with the help of machine learning. (arXiv:2204.07391v1 [physics.comp-ph])
    Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration. (arXiv:2202.03259v2 [cs.NE] UPDATED)
    It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component of modern evolutionary computation frameworks. Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications. As in classical benchmarking, problem collections with a known ground truth can offer very meaningful insights in this context. Unfortunately, settings with well-understood control policies are very rare. One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
    Simple but Effective: CLIP Embeddings for Embodied AI. (arXiv:2111.09888v2 [cs.CV] UPDATED)
    Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation. We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. We build incredibly simple baselines, named EmbCLIP, with no task specific architectures, inductive biases (such as the use of semantic maps), auxiliary tasks during training, or depth maps -- yet we find that our improved baselines perform very well across a range of tasks and simulators. EmbCLIP tops the RoboTHOR ObjectNav leaderboard by a huge margin of 20 pts (Success Rate). It tops the iTHOR 1-Phase Rearrangement leaderboard, beating the next best submission, which employs Active Neural Mapping, and more than doubling the % Fixed Strict metric (0.08 to 0.17). It also beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge. We evaluate the ability of CLIP's visual representations at capturing semantic information about input observations -- primitives that are useful for navigation-heavy embodied tasks -- and find that CLIP's representations encode these primitives more effectively than ImageNet-pretrained backbones. Finally, we extend one of our baselines, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training. Our code and models are available at https://github.com/allenai/embodied-clip  ( 2 min )
    Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing. (arXiv:2111.11236v3 [cs.RO] UPDATED)
    Although nanorobots have been used as clinical prescriptions for work such as gastroscopy, and even photoacoustic tomography technology has been proposed to control nanorobots to deliver drugs at designated delivery points in real time, and there are cases of eliminating "superbacteria" in blood through nanorobots, most technologies are immature, either with low efficiency or low accuracy, Either it can not be mass produced, so the most effective way to treat cancer diseases at this stage is through chemotherapy and radiotherapy. Patients are suffering and can not be cured. Therefore, this paper proposes an ideal model of a treatment method that can completely cure cancer, a cooperative treatment method based on nano robot queue through team member communication and computer vision image classification (target detection).
    Grassmannian Optimization for Online Tensor Completion and Tracking with the t-SVD. (arXiv:2001.11419v4 [eess.SP] UPDATED)
    We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework. We show the t-SVD is a specialization of the well-studied block-term decomposition for third-order tensors, and we present an algorithm under this model that can track changing free submodules from incomplete streaming 2-D data. The proposed algorithm uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We provide a local expected linear convergence result for our algorithm. Our empirical results are competitive in accuracy but much faster in compute time than state-of-the-art tensor completion algorithms on real applications to recover temporal chemo-sensing and MRI data under limited sampling.
    An interpretable machine learning approach for ferroalloys consumptions. (arXiv:2204.07421v1 [cs.LG])
    This paper is devoted to a practical method for ferroalloys consumption modeling and optimization. We consider the problem of selecting the optimal process control parameters based on the analysis of historical data from sensors. We developed approach, which predicts results of chemical reactions and give ferroalloys consumption recommendation. The main features of our method are easy interpretation and noise resistance. Our approach is based on k-means clustering algorithm, decision trees and linear regression. The main idea of the method is to identify situations where processes go similarly. For this, we propose using a k-means based dataset clustering algorithm and a classification algorithm to determine the cluster. This algorithm can be also applied to various technological processes, in this article, we demonstrate its application in metallurgy. To test the application of the proposed method, we used it to optimize ferroalloys consumption in Basic Oxygen Furnace steelmaking when finishing steel in a ladle furnace. The minimum required element content for a given steel grade was selected as the predictive model's target variable, and the required amount of the element to be added to the melt as the optimized variable. Keywords: Clustering, Machine Learning, Linear Regression, Steelmaking, Optimization, Gradient Boosting, Artificial Intelligence, Decision Trees, Recommendation services
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
    SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning. (arXiv:2203.07029v2 [cs.LG] UPDATED)
    We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.
    Safe Reinforcement Learning Using Black-Box Reachability Analysis. (arXiv:2204.07417v1 [cs.RO])
    Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, and a trajectory-tracking point mass with an unsafe set adjacent to the area of highest reward.
    Model-Based Deep Learning of Joint Probabilistic and Geometric Shaping for Optical Communication. (arXiv:2204.07457v1 [eess.SP])
    Autoencoder-based deep learning is applied to jointly optimize geometric and probabilistic constellation shaping for optical coherent communication. The optimized constellation shaping outperforms the 256 QAM Maxwell-Boltzmann probabilistic distribution with extra 0.05 bits/4D-symbol mutual information for 64 GBd transmission over 170 km SMF link.
    A Machine Learning Tutorial for Operational Meteorology, Part I: Traditional Machine Learning. (arXiv:2204.07492v1 [physics.ao-ph])
    Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
    Super Resolution for Turbulent Flows in 2D: Stabilized Physics Informed Neural Networks. (arXiv:2204.07413v1 [math.NA])
    We propose a new design of a neural network for solving a zero shot super resolution problem for turbulent flows. We embed Luenberger-type observer into the network's architecture to inform the network of the physics of the process, and to provide error correction and stabilization mechanisms. In addition, to compensate for decrease of observer's performance due to the presence of unknown destabilizing forcing, the network is designed to estimate the contribution of the unknown forcing implicitly from the data over the course of training. By running a set of numerical experiments, we demonstrate that the proposed network does recover unknown forcing from data and is capable of predicting turbulent flows in high resolution from low resolution noisy observations.  ( 2 min )
    Invariance Through Inference. (arXiv:2112.08526v2 [cs.LG] UPDATED)
    We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in scene content, camera poses, and lighting conditions. We present results on challenging domains including distractor control suite and sim-to-real transfer for image-based robot manipulation.
    Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. (arXiv:2204.05255v2 [cs.CR] UPDATED)
    Backdoor attacks insert malicious data into a training set so that, during inference time, it misclassifies inputs that have been patched with a backdoor trigger as the malware specified label. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." Existing clean-label backdoor attacks require knowledge of the entire training set to be effective. Obtaining such knowledge is difficult or impossible because training data are often gathered from multiple sources (e.g., face images from different users). It remains a question whether backdoor attacks still present a real threat. This paper provides an affirmative answer to this question by designing an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. With poisoning equal to or less than 0.5% of the target-class data and 0.05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger. Our attack works well across datasets and models, even when the trigger presents in the physical world. We explore the space of defenses and find that, surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.
    Experimentally realized memristive memory augmented neural network. (arXiv:2204.07429v1 [cs.ET])
    Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.
    Rethinking Machine Learning Model Evaluation in Pathology. (arXiv:2204.05205v2 [eess.IV] UPDATED)
    Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.
    Synthesizing Informative Training Samples with GAN. (arXiv:2204.07513v1 [cs.LG])
    Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial neural networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks. In this paper, we propose a novel method to synthesize Informative Training samples with GAN (IT-GAN). Specifically, we freeze a pre-trained GAN model and learn the informative latent vectors that corresponds to informative training samples. The synthesized images are required to preserve information for training deep neural networks rather than visual reality or fidelity. Experiments verify that the deep neural networks can learn faster and achieve better performance when being trained with our IT-GAN generated images. We also show that our method is a promising solution to dataset condensation problem.
    Streaming Align-Refine for Non-autoregressive Deliberation. (arXiv:2204.07556v1 [cs.CL])
    We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model. Our algorithm facilitates a simple greedy decoding procedure, and at the same time is capable of producing the decoding result at each frame with limited right context, thus enjoying both high efficiency and low latency. These advantages are achieved by converting the offline Align-Refine algorithm to be streaming-compatible, with a novel transformer decoder architecture that performs local self-attentions for both text and audio, and a time-aligned cross-attention at each layer. Furthermore, we perform discriminative training of our model with the minimum word error rate (MWER) criterion, which has not been done in the non-AR decoding literature. Experiments on voice search datasets and Librispeech show that with reasonable right context, our streaming model performs as well as the offline counterpart, and discriminative training leads to further WER gain when the first-pass model has small capacity.
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v1 [cs.CV])
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks. BNNs, on the other hand, suffer from information loss because binary activations are limited to only two values, resulting in reduced accuracy. To improve the accuracy, previous studies have attempted to control the distribution of binary activation by manually shifting the threshold of the activation function or making the shift amount trainable. During the process, they usually depended on statistical information computed from a batch. We argue that using statistical data from a batch fails to capture the crucial information for each input instance in BNN computations, and the differences between statistical information computed from each instance need to be considered when determining the binary activation threshold of each instance. Based on the concept, we propose the Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which decides the activation threshold value considering the difference between statistical data computed from a batch and each instance. The proposed INSTA-BNN outperforms the baseline by 2.5% and 2.3% on the ImageNet classification task with comparable computing cost, achieving 68.0% and 71.7% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Neural Structured Prediction for Inductive Node Classification. (arXiv:2204.07524v1 [cs.LG])
    This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
    Selecting Continuous Life-Like Cellular Automata for Halting Unpredictability: Evolving for Abiogenesis. (arXiv:2204.07541v1 [cs.NE])
    Substantial efforts have been applied to engineer CA with desired emergent properties, such as supporting gliders. Recent work in continuous CA has generated a wide variety of compelling bioreminescent patterns, and the expansion of CA research into continuous numbers, multiple channels, and higher dimensions complicates their study. In this work we devise a strategy for evolving CA and CA patterns in two steps, based on the simple idea that CA are likely to be complex and computationally capable if they support patterns that grow indefinitely as well as patterns that vanish completely, and are difficult to predict the difference in advance. The second part of our strategy evolves patterns by selecting for mobility and conservation of mean cell value. We validate our pattern evolution method by re-discovering gliders in 17 of 17 Lenia CA, and also report 5 new evolved CA that support evolved glider patterns, differing from previously reported Lenia patterns. The CA reported here share neighborhood kernels with previously described Lenia CA, but exhibit a wider range of typical dynamics than their Lenia counterparts. Code for evolving continuous CA is made available under an MIT License.
    Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection. (arXiv:2204.07569v1 [cs.IT])
    Faster-than-Nyquist (FTN) signaling is a candidate non-orthonormal transmission technique to improve the spectral efficiency (SE) of future communication systems. However, such improvements of the SE are at the cost of additional computational complexity to remove the intentionally introduced intersymbol interference. In this paper, we investigate the use of deep learning (DL) to reduce the detection complexity of FTN signaling. To eliminate the need of having a noise whitening filter at the receiver, we first present an equivalent FTN signaling model based on using a set of orthonormal basis functions and identify its operation region. Second, we propose a DL-based list sphere decoding (DL-LSD) algorithm that selects and updates the initial radius of the original LSD to guarantee a pre-defined number $N_{\text{L}}$ of lattice points inside the hypersphere. This is achieved by training a neural network to output an approximate initial radius that includes $N_{\text{L}}$ lattice points. At the testing phase, if the hypersphere has more than $N_{\text{L}}$ lattice points, we keep the $N_{\text{L}}$ closest points to the point corresponding to the received FTN signal; however, if the hypersphere has less than $N_{\text{L}}$ points, we increase the approximate initial radius by a value that depends on the standard deviation of the distribution of the output radii from the training phase. Then, the approximate value of the log-likelihood ratio (LLR) is calculated based on the obtained $N_{\text{L}}$ points. Simulation results show that the computational complexity of the proposed DL-LSD is lower than its counterpart of the original LSD by orders of magnitude.
    Accurate ADMET Prediction with XGBoost. (arXiv:2204.07532v1 [q-bio.BM])
    The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. Here, we apply an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 10 tasks and top 3 in 18 tasks.  ( 2 min )
    Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. (arXiv:2111.00009v2 [eess.AS] UPDATED)
    In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.  ( 2 min )
    Barwise Compression Schemes for Audio-Based Music Structure Analysis. (arXiv:2202.04981v2 [cs.SD] UPDATED)
    Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.  ( 2 min )
    GitTables: A Large-Scale Corpus of Relational Tables. (arXiv:2106.07258v4 [cs.DB] UPDATED)
    The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io.  ( 2 min )
    NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming. (arXiv:2109.12171v3 [cs.LG] UPDATED)
    Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.  ( 2 min )
    Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. (arXiv:2202.03666v2 [cs.LG] UPDATED)
    Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.
    Sparsifying the Update Step in Graph Neural Networks. (arXiv:2109.00909v3 [cs.LG] UPDATED)
    Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, celebrate much success in the analysis of graph-structured data. Concurrently, the sparsification of Neural Network models attracts a great amount of academic and industrial interest. In this paper we conduct a structured, empirical study of the effect of sparsification on the trainable part of MPNNs known as the Update step. To this end, we design a series of models to successively sparsify the linear transform in the Update step. Specifically, we propose the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step. In agreement with a growing trend in the literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. Our novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution. The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several downstream tasks, while containing significantly fewer trainable parameters. Our code is publicly available at: https://github.com/ChangminWu/ExpanderGNN.
    Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version. (arXiv:2203.16110v3 [cs.LG] UPDATED)
    In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v2 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces the following challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper introduces a new technique called model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation on the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.  ( 2 min )
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    The Distributed Information Bottleneck reveals the explanatory structure of complex systems. (arXiv:2204.07576v1 [cs.LG])
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science. The Distributed Information Bottleneck throttles the downstream complexity of interactions between the components of the input, deconstructing a relationship into meaningful approximations found through deep learning without requiring custom-made datasets or neural network architectures. Applied to a complex system, the approximations illuminate aspects of the system's nature by restricting -- and monitoring -- the information about different components incorporated into the approximation. We demonstrate the Distributed IB's explanatory utility in systems drawn from applied mathematics and condensed matter physics. In the former, we deconstruct a Boolean circuit into approximations that isolate the most informative subsets of input components without requiring exhaustive search. In the latter, we localize information about future plastic rearrangement in the static structure of a sheared glass, and find the information to be more or less diffuse depending on the system's preparation. By way of a principled scheme of approximations, the Distributed IB brings much-needed interpretability to deep learning and enables unprecedented analysis of information flow through a system.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Transferability Properties of Graph Neural Networks. (arXiv:2112.04629v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are composed of layers consisting of graph convolutions and pointwise nonlinearities. Due to their invariance and stability properties, GNNs are provably successful at learning representations from data supported on moderate-scale graphs. However, they are difficult to learn on large-scale graphs. In this paper, we study the problem of training GNNs on graphs of moderate size and transferring them to large-scale graphs. We use graph limits called graphons to define limit objects for graph filters and GNNs -- graphon filters and graphon neural networks (WNNs) -- which we interpret as generative models for graph filters and GNNs. We then show that graphon filters and WNNs can be approximated by graph filters and GNNs sampled from them on weighted and stochastic graphs. Because the error of these approximations can be upper bounded, by a triangle inequality argument we can further bound the error of transferring a graph filter or a GNN across graphs. Our results show that (i) the transference error decreases with the graph size, and (ii) that graph filters have a transferability-discriminability tradeoff that in GNNs is alleviated by the scattering behavior of the nonlinearity. These findings are demonstrated empirically in a movie recommendation problem and in a decentralized control task.
    Kernel similarity matching with Hebbian neural networks. (arXiv:2204.07475v1 [cs.NE])
    Recent works have derived neural networks with online correlation-based learning rules to perform \textit{kernel similarity matching}. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological "Nystr{\"o}m" method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nystr{\"o}m methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.
    Prototype-based Domain Generalization Framework for Subject-Independent Brain-Computer Interfaces. (arXiv:2204.07358v1 [eess.SP])
    Brain-computer interface (BCI) is challenging to use in practice due to the inter/intra-subject variability of electroencephalography (EEG). The BCI system, in general, necessitates a calibration technique to obtain subject/session-specific data in order to tune the model each time the system is utilized. This issue is acknowledged as a key hindrance to BCI, and a new strategy based on domain generalization has recently evolved to address it. In light of this, we've concentrated on developing an EEG classification framework that can be applied directly to data from unknown domains (i.e. subjects), using only data acquired from separate subjects previously. For this purpose, in this paper, we proposed a framework that employs the open-set recognition technique as an auxiliary task to learn subject-specific style features from the source dataset while helping the shared feature extractor with mapping the features of the unseen target dataset as a new unseen domain. Our aim is to impose cross-instance style in-variance in the same domain and reduce the open space risk on the potential unseen subject in order to improve the generalization ability of the shared feature extractor. Our experiments showed that using the domain information as an auxiliary network increases the generalization performance.  ( 2 min )
    End-to-End Sensitivity-Based Filter Pruning. (arXiv:2204.07412v1 [cs.CV])
    In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.  ( 2 min )
    Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. (arXiv:2204.07390v1 [cs.CL])
    Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.  ( 2 min )
    Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection. (arXiv:2204.07372v1 [cs.CL])
    Current works in the generation of personalized dialogue primarily contribute to the agent avoiding contradictory persona and driving the response more informative. However, we found that the generated responses from these models are mostly self-centered with little care for the other party since they ignore the user's persona. Moreover, we consider high-quality transmission is essentially built based on apprehending the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting implicit user persona. Because it's difficult to collect a large number of personas for each user, we attempt to model the user's potential persona and its representation from the dialogue absence of any external information. Perception variable and fader variable are conceived utilizing Conditional Variational Inference. The two latent variables simulate the process of people being aware of the other party's persona and producing the corresponding expression in conversation. Finally, Posterior-discriminated Regularization is presented to enhance the training procedure. Empirical studies demonstrate that compared with the state-of-the-art methods, ours is more concerned with the user's persona and outperforms in evaluations.  ( 2 min )
    Anomalous Sound Detection Based on Machine Activity Detection. (arXiv:2204.07353v1 [eess.AS])
    We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.  ( 2 min )
    SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing. (arXiv:2204.07406v1 [cs.CV])
    Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.  ( 2 min )
    Crowd counting with segmentation attention convolutional neural network. (arXiv:2204.07380v1 [cs.CV])
    Deep learning occupies an undisputed dominance in crowd counting. In this paper, we propose a novel convolutional neural network (CNN) architecture called SegCrowdNet. Despite the complex background in crowd scenes, the proposeSegCrowdNet still adaptively highlights the human head region and suppresses the non-head region by segmentation. With the guidance of an attention mechanism, the proposed SegCrowdNet pays more attention to the human head region and automatically encodes the highly refined density map. The crowd count can be obtained by integrating the density map. To adapt the variation of crowd counts, SegCrowdNet intelligently classifies the crowd count of each image into several groups. In addition, the multi-scale features are learned and extracted in the proposed SegCrowdNet to overcome the scale variations of the crowd. To verify the effectiveness of our proposed method, extensive experiments are conducted on four challenging datasets. The results demonstrate that our proposed SegCrowdNet achieves excellent performance compared with the state-of-the-art methods.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network. (arXiv:2204.07360v1 [eess.SP])
    This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region.  ( 2 min )
    Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts. (arXiv:2204.07312v1 [math.OC])
    The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.  ( 2 min )
    Knowledgebra: An Algebraic Learning Framework for Knowledge Graph. (arXiv:2204.07328v1 [cs.LG])
    Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.  ( 2 min )
    XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding. (arXiv:2204.07316v1 [cs.CL])
    Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.  ( 2 min )
    Ensemble diverse hypotheses and knowledge distillation for unsupervised cross-subject adaptation. (arXiv:2204.07308v1 [cs.RO])
    Recognizing human locomotion intent and activities is important for controlling the wearable robots while walking in complex environments. However, human-robot interface signals are usually user-dependent, which causes that the classifier trained on source subjects performs poorly on new subjects. To address this issue, this paper designs the ensemble diverse hypotheses and knowledge distillation (EDHKD) method to realize unsupervised cross-subject adaptation. EDH mitigates the divergence between labeled data of source subjects and unlabeled data of target subjects to accurately classify the locomotion modes of target subjects without labeling data. Compared to previous domain adaptation methods based on the single learner, which may only learn a subset of features from input signals, EDH can learn diverse features by incorporating multiple diverse feature generators and thus increases the accuracy and decreases the variance of classifying target data, but it sacrifices the efficiency. To solve this problem, EDHKD (student) distills the knowledge from the EDH (teacher) to a single network to remain efficient and accurate. The performance of the EDHKD is theoretically proved and experimentally validated on a 2D moon dataset and two public human locomotion datasets. Experimental results show that the EDHKD outperforms all other methods. The EDHKD can classify target data with 96.9%, 94.4%, and 97.4% average accuracy on the above three datasets with a short computing time (1 ms). Compared to a benchmark (BM) method, the EDHKD increases 1.3% and 7.1% average accuracy for classifying the locomotion modes of target subjects. The EDHKD also stabilizes the learning curves. Therefore, the EDHKD is significant for increasing the generalization ability and efficiency of the human intent prediction and human activity recognition system, which will improve human-robot interactions.  ( 2 min )
    Methodical Advice Collection and Reuse in Deep Reinforcement Learning. (arXiv:2204.07254v1 [cs.LG])
    Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.  ( 2 min )
    A Differentially Private Probabilistic Framework for Modeling the Variability Across Federated Datasets of Heterogeneous Multi-View Observations. (arXiv:2204.07352v1 [cs.LG])
    We propose a novel federated learning paradigm to model data variability among heterogeneous clients in multi-centric studies. Our method is expressed through a hierarchical Bayesian latent variable model, where client-specific parameters are assumed to be realization from a global distribution at the master level, which is in turn estimated to account for data bias and variability across clients. We show that our framework can be effectively optimized through expectation maximization (EM) over latent master's distribution and clients' parameters. We also introduce formal differential privacy (DP) guarantees compatibly with our EM optimization scheme. We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease. We demonstrate that our method is robust when data is distributed either in iid and non-iid manners, even when local parameters perturbation is included to provide DP guarantees. Moreover, the variability of data, views and centers can be quantified in an interpretable manner, while guaranteeing high-quality data reconstruction as compared to state-of-the-art autoencoding models and federated learning schemes. The code is available at https://gitlab.inria.fr/epione/federated-multi-views-ppca.  ( 2 min )
    Crowd counting with crowd attention convolutional neural network. (arXiv:2204.07347v1 [cs.CV])
    Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually regard some objects as people mistakenly; causing potentially enormous errors in the crowd counting result. To address the problem, we propose a novel end-to-end model called Crowd Attention Convolutional Neural Network (CAT-CNN). Our CAT-CNN can adaptively assess the importance of a human head at each pixel location by automatically encoding a confidence map. With the guidance of the confidence map, the position of human head in estimated density map gets more attention to encode the final density map, which can avoid enormous misjudgements effectively. The crowd count can be obtained by integrating the final density map. To encode a highly refined density map, the total crowd count of each image is classified in a designed classification task and we first explicitly map the prior of the population-level category to feature maps. To verify the efficiency of our proposed method, extensive experiments are conducted on three highly challenging datasets. Results establish the superiority of our method over many state-of-the-art methods.  ( 2 min )
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v1 [cs.LG])
    Graph neural networks have emerged as a leading architecture for many graph-level tasks such as graph classification and graph generation with a notable improvement. Among these tasks, graph pooling is an essential component of graph neural network architectures for obtaining a holistic graph-level representation of the entire graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these methods. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods on graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods and provide a mathematical summary for each category; 2) next, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) then, we further outline in brief the applications that incorporate the idea of graph pooling in a number of domains; 4) and finally, we discuss some critical challenges faced by the current studies and share our insights on potential directions for improving graph pooling in the future.  ( 2 min )
    Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning. (arXiv:2204.07373v1 [cs.RO])
    Adversarial training (i.e., training on adversarially perturbed input data) is a well-studied method for making neural networks robust to potential adversarial attacks during inference. However, the improved robustness does not come for free but rather is accompanied by a decrease in overall model accuracy and performance. Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off but inflict a net loss when measured in holistic robot performance. This work revisits the robustness-accuracy trade-off in robot learning by systematically analyzing if recent advances in robust training methods and theory in conjunction with adversarial robot learning can make adversarial training suitable for real-world robot applications. We evaluate a wide variety of robot learning tasks ranging from autonomous driving in a high-fidelity environment amenable to sim-to-real deployment, to mobile robot gesture recognition. Our results demonstrate that, while these techniques make incremental improvements on the trade-off on a relative scale, the negative side-effects caused by adversarial training still outweigh the improvements by an order of magnitude. We conclude that more substantial advances in robust learning methods are necessary before they can benefit robot learning tasks in practice.  ( 2 min )
    Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. (arXiv:2204.07305v1 [cs.CV])
    Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf.  ( 2 min )
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.  ( 2 min )
    Unsupervised Probabilistic Models for Sequential Electronic Health Records. (arXiv:2204.07292v1 [cs.LG])
    We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models. (arXiv:2204.07288v1 [cs.CL])
    With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy costs than Big Bird. For summarization, we find that increasing model size is more energy efficient than increasing sequence length for higher accuracy. However, this comes at the cost of a large drop in inference speed. For question answering, we find that smaller models are both more efficient and more accurate due to the larger training batch sizes possible under a fixed resource budget.  ( 2 min )
    Active Learning for Regression and Classification by Inverse Distance Weighting. (arXiv:2204.07177v1 [cs.LG])
    This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{this http URL}.  ( 2 min )
    Hierarchical Embedded Bayesian Additive Regression Trees. (arXiv:2204.07207v1 [stat.ME])
    We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to specify the structure of the random effects in the model, whilst maintaining the prediction and uncertainty calibration properties of standard BART. Using simulated and real-world examples, we demonstrate that this new extension yields superior predictions for many of the standard mixed effects models' example data sets, and yet still provides consistent estimates of the random effect variances. In a future version of this paper, we outline its use in larger, more advanced data sets and structures.  ( 2 min )
    Spatio-Temporal Analysis of Transformer based Architecture for Attention Estimation from EEG. (arXiv:2204.07162v1 [q-bio.NC])
    For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a novel framework allowing us to retrieve the attention state, i.e degree of attention given to a specific task, from EEG signals. While previous methods often consider the spatial relationship in EEG through electrodes and process them in recurrent or convolutional based architecture, we propose here to also exploit the spatial and temporal information with a transformer-based network that has already shown its supremacy in many machine-learning (ML) related studies, e.g. machine translation. In addition to this novel architecture, an extensive study on the feature extraction methods, frequential bands and temporal windows length has also been carried out. The proposed network has been trained and validated on two public datasets and achieves higher results compared to state-of-the-art models. As well as proposing better results, the framework could be used in real applications, e.g. Attention Deficit Hyperactivity Disorder (ADHD) symptoms or vigilance during a driving assessment.  ( 2 min )
    Testing distributional assumptions of learning algorithms. (arXiv:2204.07196v1 [cs.LG])
    There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.  ( 2 min )
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.  ( 2 min )
    Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification. (arXiv:2204.07246v1 [cs.RO])
    This study explores how robots and generative approaches can be used to mount successful false-acceptance adversarial attacks on signature verification systems. Initially, a convolutional neural network topology and data augmentation strategy are explored and tuned, producing an 87.12% accurate model for the verification of 2,640 human signatures. Two robots are then tasked with forging 50 signatures, where 25 are used for the verification attack, and the remaining 25 are used for tuning of the model to defend against them. Adversarial attacks on the system show that there exists an information security risk; the Line-us robotic arm can fool the system 24% of the time and the iDraw 2.0 robot 32% of the time. A conditional GAN finds similar success, with around 30% forged signatures misclassified as genuine. Following fine-tune transfer learning of robotic and generative data, adversarial attacks are reduced below the model threshold by both robots and the GAN. It is observed that tuning the model reduces the risk of attack by robots to 8% and 12%, and that conditional generative adversarial attacks can be reduced to 4% when 25 images are presented and 5% when 1000 images are presented.  ( 2 min )
    The training response law explains how deep neural networks learn. (arXiv:2204.07291v1 [cond-mat.dis-nn])
    Deep neural network is the widely applied technology in this decade. In spite of the fruitful applications, the mechanism behind that is still to be elucidated. We study the learning process with a very simple supervised learning encoding problem. As a result, we found a simple law, in the training response, which describes neural tangent kernel. The response consists of a power law like decay multiplied by a simple response kernel. We can construct a simple mean-field dynamical model with the law, which explains how the network learns. In the learning, the input space is split into sub-spaces along competition between the kernels. With the iterated splits and the aging, the network gets more complexity, but finally loses its plasticity.  ( 2 min )
    Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks. (arXiv:2204.07261v1 [cs.LG])
    We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.  ( 2 min )
    Learning two-phase microstructure evolution using neural operators and autoencoder architectures. (arXiv:2204.07230v1 [cond-mat.mtrl-sci])
    Phase-field modeling is an effective mesoscale method for capturing the evolution dynamics of materials, e.g., in spinodal decomposition of a two-phase mixture. However, the accuracy of high-fidelity phase field models comes at a substantial computational cost. Hence, fast and generalizable surrogate models are needed to alleviate the cost in computationally taxing processes such as in optimization and design of materials. The intrinsic discontinuous nature of the physical phenomena incurred by the presence of sharp phase boundaries makes the training of the surrogate model cumbersome. We develop a new framework that integrates a convolutional autoencoder architecture with a deep neural operator (DeepONet) to learn the dynamic evolution of a two-phase mixture. We utilize the convolutional autoencoder to provide a compact representation of the microstructure data in a low-dimensional latent space. DeepONet, which consists of two sub-networks, one for encoding the input function at a fixed number of sensors locations (branch net) and another for encoding the locations for the output functions (trunk net), learns the mesoscale dynamics of the microstructure evolution in the latent space. The decoder part of the convolutional autoencoder can then reconstruct the time-evolved microstructure from the DeepONet predictions. The result is an efficient and accurate accelerated phase-field framework that outperforms other neural-network-based approaches while at the same time being robust to noisy inputs.  ( 2 min )
    Minimizing Control for Credit Assignment with Strong Feedback. (arXiv:2204.07249v1 [cs.NE])
    The success of deep learning attracted interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.  ( 2 min )
    Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers. (arXiv:2204.07182v1 [cs.AI])
    Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.  ( 2 min )
    Relaxing Equivariance Constraints with Non-stationary Continuous Filters. (arXiv:2204.07178v1 [cs.LG])
    Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although some tasks are inherently equivariant, many tasks do not strictly follow such symmetries. In such cases, equivariance constraints can be overly restrictive. In this work, we propose a parameter-efficient relaxation of equivariance that can effectively interpolate between a (i) non-equivariant linear product, (ii) a strict-equivariant convolution, and (iii) a strictly-invariant mapping. The proposed parameterization can be thought of as a building block to allow adjustable symmetry structure in neural networks. Compared to non-equivariant or strict-equivariant baselines, we experimentally verify that soft equivariance leads to improved performance in terms of test accuracy on CIFAR-10 and CIFAR-100 image classification tasks.  ( 2 min )
    Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition. (arXiv:2204.07208v1 [cs.LG])
    CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.  ( 2 min )
    Harnessing Interpretable Machine Learning for Origami Feature Design and Pattern Selection. (arXiv:2204.07235v1 [cond-mat.soft])
    Engineering design of origami systems is challenging because comparing different origami patterns requires using categorical features and evaluating multi-physics behavior targets introduces multi-objective problems. This work shows that a decision tree machine learning method is particularly suitable for the inverse design of origami. This interpretable machine learning method can reveal complex interactions between categorical features and continuous features for comparing different origami patterns, can tackle multi-objective problems for designing active origami with multi-physics performance targets, and can extend existing origami shape fitting algorithms to further consider non-geometrical performances of origami systems. The proposed framework shows a holistic way of designing active origami systems for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.  ( 2 min )
    Physics-Aware Recurrent Convolutional (PARC) Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials. (arXiv:2204.07234v1 [cond-mat.mtrl-sci])
    The thermomechanical properties of energetic materials (EM) are known to be a function of their microscopic structures, i.e., morphological configurations of crystals and pores. This microstructural dependency has motivated vigorous research in the EM community, seeking to engineer material microstructures with targeted properties and performance under the materials-by-design paradigm. However, establishing the complex structure-property-performance (SPP) relationships of EMs demands extensive experimental and simulation efforts, and assimilating and encapsulating these relationships in usable models is a challenge. Here, we present a novel deep learning method, Physics-Aware Recurrent Convolutional (PARC) Neural Network, that can "learn" the mesoscale thermo-mechanics of EM microstructures during the shock-to-detonation transition (SDT). We show that this new approach can produce accurate high-fidelity predictions of time-evolving temperature and pressure fields of the same quality as the state-of-the-art direct numerical simulations (DNS), despite the dramatic reduction of computing time, from hours and days on a high-performance computing cluster (HPC) to a little more than a second on a commodity laptop. We also demonstrate that PARC can provide physical insights, i.e., the artificial neurons can illuminate the underlying physics by identifying which microstructural features led to critical hotspots and what are the characteristics of "critical" versus "non-critical" microstructures. This new knowledge generated alongside the capacity to conduct high-throughput experiments will broaden our theoretical understanding of the initiation mechanisms of EM detonation, as a step towards engineering EMs with specific properties.  ( 2 min )
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.  ( 2 min )
  • Open

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 2 min )
    A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation. (arXiv:2204.00036v2 [stat.ME] UPDATED)
    One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.  ( 2 min )
    Bayesian Nonparametrics for Sparse Dynamic Networks. (arXiv:1607.01624v2 [stat.ML] UPDATED)
    In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Warped Dynamic Linear Models for Time Series of Counts. (arXiv:2110.14790v2 [stat.ME] UPDATED)
    Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.  ( 2 min )
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.  ( 2 min )
    Distributed Reconstruction of Noisy Pooled Data. (arXiv:2204.07491v1 [cs.IT])
    In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.  ( 2 min )

  • Open

    [R][P] Mask Transfiner for High-Quality Instance Segmentation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [P] Spoonfy: Turn any foreign-language video into effective listening practice
    Video (Despacito, slightly NSFW): https://drive.google.com/file/d/12qYKv_yaqGr9GWvHPtE9ng2foJVPfpoE/view?usp=sharing Code & more info: https://github.com/athairus/SpoonfyDemo Discord: https://discord.gg/7wcZZzeSQk Spoonfy is essentially the so-called Telenovela method (learning languages through subtitled video) on steroids: This demo uses a finetuned Facebook's M2M100 model to translate Spanish to English (finetuned to do literal translation instead of ordinary translation) and a wav2vec2 model to get Spanish word timings to present the literal (aka word-by-word) translations as karaoke-style lyrics. What sets Spoonfy apart from other solutions is the way it leverages the massive body of existing subtitled content out there to create learning material. Also because it's FOSS. More details in the code's README. I've been working on this project by myself for a few months now, I hope you see potential behind it like I do! If so (but also if not), I'd love to hear what you think. And I'd love to get your help improving on what I already built. I have plenty of ideas for how to make the translations even more accurate, the system more robust & able to handle more sources of content (YouTube, TikTok, Blu-Rays), etc. Thanks for checking it out! submitted by /u/athairus [link] [comments]  ( 1 min )
    [D] Current work on knowledge representation with your preference, and use of language models
    In robotics and autonomous systems, knowledge representation is an important aspect. What is your favorite methods for knowledge representation, is it Formal logic or graphs or whatever and why you like that kind of representation. Considering the success of large language model isn't it a good time to use them in new kind of representation, so that robots or similar system can make better decisions in an environment. I still feel there is no common consensus in community for correct way of knowledge representation, correct me If I am wrong. submitted by /u/projekt_treadstone [link] [comments]  ( 1 min )
    [D] DALL-E 2 vs Disco Diffusion - SHOWDOWN!
    submitted by /u/nin_artificial [link] [comments]  ( 1 min )
    WACV vs. BMVC [R]
    How do they compare in terms of the communities, prestige, competitiveness, and impact. I have a paper accepted to a CVPR workshop and considering extending it and submitting to one of these. The work is based on explainability in medical vision. It's more methods-oriented rather than large-scale experiments. What are your suggestions? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
    submitted by /u/davidbun [link] [comments]  ( 5 min )
    [D] Is it ok to promise a dataset in your paper, get published and then not release it?
    Recently, I decided to explore NeRF and found a very interesting dataset in the NeRS paper of 3D models, which was published in NeurIPS 2021 four months ago. Authors promised to release their dataset: The filtered dataset with anonymized personally identifiable information (e.g. license plates and phone numbers), masks, initial camera poses, and optimized NeRS cameras will be made available on the project page. However, if you check their project page or github repo — there is nothing there. I do not have much experience in machine learning, but wonder whether it's ok to do this? My thinking was that it is something to look down upon, but in this case it is done by Carnegie Mellon University (which is a top-tier one in ML?) on a top-tier conference (NeurIPS 2021). So I assume it's fine? submitted by /u/throwmeaway-account [link] [comments]  ( 5 min )
    [D] Wasserstein distance lipschitz vs gaussian distribution
    Hi, I heard there are different ways to calculate Wasserstein distance in Neural network context. First, We can convert 1-d Wasserstein loss to dual representation and constraint it's size(to makes lipschitz function). We need to do weight clip to make our model a lipschitz function. Second, we can make neural network output as Gaussian distribution and calculate easy form using neural network output as mean and covariance matrix. So, what are the advantages and disadvantages of comparing them? It may sound ambiguous, but I have not seen a study that compares the two about representation quality, computation, etc... Thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] What do you use to make your blog/personal websites?
    I've noticed a lot of folks in ML have a personal website that doubles as a blog to write about their work/projects. As someone looking to build their own website along the same lines, I'm looking for frameworks to try and build it with. What framework do you use to design your site? submitted by /u/SwiftLynx [link] [comments]  ( 3 min )
    [Discussion] Interpretable Neural Network ... ?
    Hi All! I've been working on a linear method that extracts signals from images by learning a set of composable image filters. It can recompose an image using these filters as seen on this biological histology tissue (real on the right, recomposed on the left) ​ ​ https://preview.redd.it/czgdk6edx0u81.png?width=768&format=png&auto=webp&s=8768f93a749fff7dc41e576a74403096e113942e ​ Because it is a linear method that learns image filters - I had an idea: what if some components of a neural network could be replaced with a learnable set of filters? For those not in the know, image filters are similar to masks that upweight some parts of the image, and downweights other parts - similar to a highlighter to select text and a pen to cross out words. I show how in the figure below: ​ ​ https://preview.redd.it/2azco14ex0u81.jpg?width=499&format=pjpg&auto=webp&s=f3795fec06daa13da61ec155159a0ad865524530 ​ Learning a set of image filters with a neural network is a good idea, as neural networks are much more flexible and are considered to be "universal function approximations". So I wrote up a Pytorch package to pass the neural network feature weights from Convolutions and Max Pooling into the linear method to learn a relevant set of filters - results are comparable even on CIFAR10. The caveat is that there is no ReLU, no other activation functions, and no Dropout - only 1 main single linear layer that learns filters... an interpretable neural network! ​ Results are all here (including ipynb comparing with base CNN and VGG16) https://github.com/AskExplain/Interpretable-Neural-Net ​ I'll update the GitHub with some figures of why the single layer is interpretable soon ... ​ In the meantime - discuss! submitted by /u/TryToExplainHow [link] [comments]  ( 3 min )
    [P] New Graph Data Augmentation Library
    Hello! I recently built grafog, a graph data augmentation library on top of PyTorch Geometric. You can chain together graph mentations as done in albumentations or torchvision.transforms. Check it out: https://github.com/rish-16/grafog It has the following augmentations: Random Node Drop Random Edge Drop Normalize Features MixUp Strategy Node Feature Masking Edge Feature Masking https://preview.redd.it/c53r7gkrk0u81.png?width=689&format=png&auto=webp&s=8fbe668e82571a7fe5de9ebb5e4690dbd34032bb https://preview.redd.it/5zrj4gkrk0u81.png?width=863&format=png&auto=webp&s=5bd02ea4adaf86b8911fa89372be9f05f9010536 Happy augmenting! submitted by /u/rish-16 [link] [comments]
    [N]: How does OpenAI's DALL-E 2 work?
    submitted by /u/giugiacaglia [link] [comments]
    [D] What is the opposite of an ablative study?
    I've the feeling that this question may be really stupid but I make it anyway. In ML we often see ablative studies. How is the opposite of it called? In other words: A study that aim to improve a model, and once an improvement is reached, this new model is taken as basis for further investigations? submitted by /u/Rogitus [link] [comments]  ( 2 min )
  • Open

    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    AI Trippy Dream 19 - Exploring a Colorful Maze
    submitted by /u/LordPewPew777 [link] [comments]
    a better Boids simulation: An artificial life simulation of the flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    New AI upscaler tool
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    credit scoring for companies
    Hello everyone I'm newbie so pardon me if you find that my question is stupid. I'm working on a project here's it's description in a nutshell ( Classifying companies if they're going to bankrupt or not and based of the probability of default ( probability of bankruptcy) give each companies a score For example 88 percent to bankrupt score is D 21 percent to bankrupt score is B 3 percent to bankrupt score is a) My question is what kind of models should test ? Should i go for machine learning algorithms such as logistic regression, knn, SVM? Should I go for neural networks ANN? Or can I use deep learning models like MLP... probabilistic Neural Network? Any guidance or advice will be appreciated and thanks a lot. submitted by /u/YeccAnon4 [link] [comments]  ( 1 min )
  • Open

    LSTM for time series prediction
    Hi I am doing a project where I have to predict sales for a company and I am having some trouble with my LSTM model in python. All research I have done tells me that LSTM is as good, if not better, than a ARIMA model for forecasting on time series data, but my LSTM is significantly worse than my ARIMA model. Would it be possible for any one to help me to see if I have implemented it right? I have used both Tensorflow and Pytorch and both are way worse than the ARIMA model. submitted by /u/magnussendjoko [link] [comments]  ( 2 min )
  • Open

    Learning style of play (different agents' actions) in the same offline RL environment?
    Hi, everyone. I'm a relative novice in RL, so bear with me as I try to formulate my question. I'm working on a chess bot that can play moves like a player (imitate their style of play) that is chosen from a set of players (that the bot is trained on) , if I give the bot the previous x moves. Using more technical terms, I'm trying to create an agent that is given a sequence of states-actions of another agent (player) and some representation of who that agent (player) is, and predict the next action (continue playing in the style of that player). I'm fairly certain this is an RL problem, as I don't know how to frame it as a supervised learning problem (I might be wrong). I've seen some papers that abstract offline RL as a sequence modeling problem (Decision Transformer, Trajectory Transformer), so I'm fairly certain I should continue in a similar manner. But I'm having a hard time trying to understand how to treat the difference in players. My instinct was to use some representation of the player as the reward, but then how would I even optimize for it or even give it as an input? Do I just add the player as a feature in the game state, but then what should be the reward? Has this been done before, or something similar? I couldn't really find any paper or code that worked on differentiating the training data by who made it (I might not be wording it correctly). submitted by /u/OverhypeUnderdeliver [link] [comments]  ( 3 min )
  • Open

    Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience
    Data has become a huge area of business, helping businesses to drive their intelligence, make better decisions, and formulate strategic plans for future growth. The post Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience appeared first on Data Science Central.  ( 7 min )
    ML classifies gravitational-wave glitches with high accuracy
    The LIGO observatory can detect astronomical events from billions of light years away. Terabytes of complex daily data makes human analysis impossible. New study applies neural network with up to 97% classification accuracy. Caltech/MIT’s LIGO, the largest gravitational-wave observatory in the world, collects data on minute space-time ripples from cataclysmic astronomical events like colliding black… Read More »ML classifies gravitational-wave glitches with high accuracy The post ML classifies gravitational-wave glitches with high accuracy appeared first on Data Science Central.  ( 4 min )
    Zero Trust Principles: What is Zero Trust Model?
    The central principle of the Zero Trust model is based on the authentication and verification of every device connecting to the network before they are trusted. Former Forrester analyst and veteran of the high-technology world, John Kindervag, who has been actively part of a wide array of network technology projects, coined the term “Zero Trust”… Read More »Zero Trust Principles: What is Zero Trust Model? The post Zero Trust Principles: What is Zero Trust Model? appeared first on Data Science Central.  ( 4 min )

  • Open

    [R] Questions about ACL Rolling Review
    A few questions about ACL ARR: - If you request to reassign a reviewer, would the editor aim for reassigning all three reviewers or he would go for reassigning only that particular reviewer? Assume you have given a valid reason for reassignment and the editor is convinced. - If you request to reassign a reviewer, can the new reviewer see the previous reviews/scores before submitting his own review? Or he would access to the previous revision after submitting his own review. I already know (have heard) that in many cases reviewers are not available, and it becomes inevitable to get an entirely new set of reviews. I already know this. But my questions are about the case that the reviewer availability is not an issue. Juts trying to find out how things are managed submitted by /u/sim_inf [link] [comments]  ( 1 min )
    [R][P] MultiMAE: Multi-modal Multi-task Masked Autoencoders + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [Discussion]Is it possible to find a SWE job with a DS master degree? Or would it be possible to make the transtion later on?
    Say that a masters student graduates in a DS program with heavy focus on data and CS (so knows the basics of CS like data structures and programming, and has also studied courses like data mining, big data analytics, and machine learning), what are their possible job openings and relatively easy positions to get into? My understanding is really rudimentaly, feel free to correct me pls: Data scientist, this should be the most fitting and easy-to-get-interview position. Difficulty level 1/5. Data analyist, same as 1. Difficulty level 1/5. Data engineer, same as 1. Difficulty level 1/5. Machine learning engineer, has a higher bar than 1, 2, and 3, and it's very difficult to get interviews without proper background and work experience. So it's very difficult to become one for a masters graduate in DS, but it's quite possible for DS/DE (but much less so for DA) to make into MLE positions. Difficulty level 3/5. Software engineer, it's totally another realm, and has very few skill overlap with 1, 2, and 3. So it's very hard to make the transition or land a SWE job for DS students. Difficulty level 5/5. submitted by /u/Competitive_Map_935 [link] [comments]  ( 1 min )
    [Project] Open-source playground to generate images from text using DALL-E Mini
    submitted by /u/koryoislie [link] [comments]
    [D] Incorporating node features into GNNs?
    Hey all, I am looking to learn more about how to incorporate node features with their embeddings for training. Specifically, I am working with gene-gene interaction networks, and also want to include RNA-sequencing quantifications. If anyone has a good introductory resource so I can familiarize myself with the process, I would really appreciate it! submitted by /u/PM_ME_A_ONELINER [link] [comments]  ( 1 min )
    [D] Moderation uniformity in subreddit
    This isn't meant to be a rant. Rather far from it. Yesterday I posted a legitimate question about databases choices in r/MachineLearning. This was about what technical choices ML members are currently using for large scale data ingestion in a continual learning environment. The post was removed. That post was neither a (1) beginner nor (2) offensive and (3) aimed to be a constructive discussion suitable as mid-range ML query and (4) marked with appropriate flair I finally posted it elsewhere. Yet today I see questions about transitions between DS -> MLE and quirky labgroup names which can be used from ML terms. These aren't even research questions. https://www.reddit.com/r/MachineLearning/comments/u503vz/d_is_it_easier_to_transition_to_mle_as_a_ds_or_swe/ https://www.reddit.com/r/MachineLearning/comments/u5091o/d_do_you_know_any_funny_team_names_with/ How is this fair moderation genuinely? How can we improve the noise filter or do better moderation? submitted by /u/mlbloke [link] [comments]  ( 2 min )
    [D] Counterfactual Fairness
    So I watched this old video by Microsoft Research: https://www.youtube.com/watch?v=psA4U6nhZ70 To summarize, it uses the fairness criteria that sensitive attribute A will give the same prediction regardless of the value, when using counterfactuals. That is, if you're male or female, it shouldn't influence the models predictions. The idea seems decent at first glance. But what if the "bias" or "unfairness" that the model creates based on sensitive attribute A isn't caused by a dataset bias but rather detects a real signal in the data? The model proposed by Microsoft Research doesn't take into consideration that the prediction on the sensitive attribute A does not necessarily consist of ONLY unfairness. They simply define it as such. Is such an algorithmic design choice not exactly one of the flaws that we seek to eliminate? Assuming, that not all of the imbalance in predictions by the model on the sensitive attribute A is caused by "unfairness" but that some of it is caused by an inherent difference, then are they not introducing direct human bias and unfairness into their model by explicitedly designing the system to fit their own human (and political) bias? Don't get me wrong; the opposite is just as bad. Assuming that ALL of the imbalance in the prediction on the sensitive attribute A is caused by "inherent differences" is just as bad. Do you know of anyone that has tackled this in a good manner? How would you even begin to estimate how much is due to an "inherent difference" and how much is due to "bias, unfairness, noise" (or otherwise)? submitted by /u/caahel [link] [comments]  ( 4 min )
    [D] Paper Explained - Transformer Memory as a Differentiable Search Index (Full Video Walkthrough)
    https://youtu.be/qlB0TPBQ7YY Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! OUTLINE: 0:00 - Intro 0:45 - Sponsor: Diffgram 1:35 - Paper overview 3:15 - The search problem, classic and neural 8:15 - Seq2seq for directly predicting document IDs 11:05 - Differentiable search index architecture 18:05 - Indexing 25:15 - Retrieval and document representation 33:25 - Training DSI 39:15 - Experimental results 49:25 - Comments & Conclusions ​ Paper: https://arxiv.org/abs/2202.06991 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [R] Useful method to train models for adversarial robustness
    submitted by /u/IncredibleMac [link] [comments]
    [P] Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 1 min )
    [D] Spotify's Podcast Search Explained
    I wrote this article breaking down how Spotify have applied semantic search to enhance podcast discovery. I find it super interesting to see the approach Spotify have used in terms of data sources, model fine-tuning, and vector search - and wanted to show how to almost replicate it. Let me know if you have any thoughts on their approach! submitted by /u/jamescalam [link] [comments]
    [R] Machine learning in management of precautionary closures caused by lipophilic biotoxins
    In this work, we have covered a deep study of alternatives in order to improve the aquaculture of mussels with very noisy and unbalanced data https://www.sciencedirect.com/science/article/pii/S0168169922002733 submitted by /u/ennanco [link] [comments]
    [P] RR-GCN now supports multi-modal learning!
    We have just released v0.0.2 of our RR-GCN. This release includes support for multi-modal learning. Node embeddings can now be initialised with literal information or pre-trained embeddings for text and image data. Go check out our notebooks that show how we can achieve state-of-the-art performance on several benchmark datasets in less than one minute. Moreover, and more importantly, the representations produced by our RR-GCN are unsupervised and parameter-free (i.e. no training is required), making it possible to re-use them for multiple downstream ML tasks with high predictive performances. ​ https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]  ( 1 min )
    [D] Paper Explained – SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?
    https://youtu.be/XHAoV_nKr1o This video explains the 10 billion parameter SEER model from MetaAI by Goyal et al. 2022. Paper link: https://arxiv.org/abs/2202.08360 Official implementation: https://github.com/facebookresearch/vissl/tree/main/projects/SEER Short description: The 10 billion parameter SEER model from u/MetaAI is *fairer*, even though it is trained on *uncurated* data. How so? Check out our take on this. Outline: 00:00 Training on uncurated data 01:12 Diffgram (Sponsor) 01:46 Toxicity in large models 02:43 What to do against model toxicity? 03:53 SEER model explained 06:52 SEER is fairer. But how? submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
  • Open

    Is there an AI I could use to create an artificial Terence McKenna chatbot?
    He’s basically this wacky dead philosopher with 1000s of hours of his lectures of YT and I was thinking it may be possible to create an artificial AI personality of his from all of his recorded speech? Would there be a simple enough program I could download or anything of the sort? submitted by /u/Vaporshots [link] [comments]  ( 1 min )
    Boids: An artificial life simulation of a flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    AI Trippy Dream 37 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Amazing Generation
    Looks Amazing. The vibe is there. What do you think ? How did he archive this ? Created by Hand or through artificial ? https://www.tiktok.com/@ai.metascape/video/7086451191151971586 submitted by /u/PillowG1rl [link] [comments]
    Little Baby Chibi Lucy Loud
    submitted by /u/VIRUS-AOTOXIN [link] [comments]
    Artificial Intelligence is the Future of Deterrence
    submitted by /u/much_successes [link] [comments]
    LinkedIn Open-Sources ‘Feathr’, It’s Feature Store To Simplify Machine Learning (ML) Feature Management And Improve Developer Productivity
    LinkedIn research team has recently open-sourced feature store, Feathr, created to simplify machine learning (ML) feature management and increase developer productivity. Feathr is used by dozens of LinkedIn applications to define features, compute them for training, deploy them in production, and share them across consumers. Compared to previous application-specific feature pipeline solutions, Feathr users reported significantly reduced time required to add new features to model training and improved runtime performance. Hundreds of ML models run on LinkedIn in Search, Feed, and Ads applications. Thousands of features about entities in the Economic Graph, such as companies, job postings, and LinkedIn members, power the models. The most time-consuming aspects of handling the ML applications at scale have been preparing and managing features. Continue reading the summary Github: https://github.com/linkedin/feathr LinkedIn Blog: https://engineering.linkedin.com/blog/2022/open-sourcing-feathr—linkedin-s-feature-store-for-productive-m submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Artificial Nightmares: Crypt Walker || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    I created a DIY python package to ensemble multimodal models
    Multimodal: A python package to ensemble speech, text, etc. models and build new applications. Sample Applications: Speech Named Entity Anonymizer, Speech Question Answering, Speech Generation Code: kritiksoman/Multimodal: Listen. Write. Speak. Read. Think. (github.com) submitted by /u/kritiksoman [link] [comments]
  • Open

    Rigorous treatment of MDPs, Bellman, etc. in continuous spaces?
    I am looking for a book/monograph that goes through all the basics of reinforcement learning for continuous spaces with mathematical rigor. The classic RL book from Sutton/Barto and the new RL theory book from Agarwal/Jiang/Kakade/Sun both stick to finite MDPs except for special cases like linear MDPs and the LQR. I assume that a general statement of the fundamentals for continuous spaces will require grinding through a lot of details on existence, measurability, suprema vs. maxima, etc., that are not issues in the finite case. Is this why these authors avoid it? clarifying edit: I don't need to go all the way to continuous time - just state and action spaces. Maybe one of Bertsekas's books? submitted by /u/quadprog [link] [comments]  ( 1 min )
    NEED HELP MAKING A BASIC PYTHON MODEL
    I have a 2 column dataset “Date” “Result”. The Result column produces a 0 or 1 for each date. I need to make a reinforcement model that will predict whether or not the next result will be a 0 or 1. It needs to be done in jupyter notebook . submitted by /u/EffectiveBug4629 [link] [comments]  ( 1 min )
    From machine learning to sequential decision problems (reinforcement learning)
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process. An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book for a complete summary of the four classes for pure learning problems (aka bandit problems). See https://tinyurl.com/RLandSO/ Curious why Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Policy gradient vs. Policy iteration?
    Hello, I'm currently learning about MDPs and machine learning. I have a few questions that might be trivial or obvious but I can't find many concrete answers online: -Are policy gradient and policy iteration similar/the same? From what I can gather, policy iteration is a type or subset of policy gradient algorithm, is this correct? -Are all policy learning methods less effective for large state spaces? From my understanding you need to use some kind of value function iteration and heursitic function for larger state spaces because you can't encounter all states enough times to converge on an optimal policy -Does convergence on a policy/value function find a local or global optimum? With neural nets, simple backpropagation may only find a local minimum for the cost function, is this true of MDP/RL iteration algorithms? Thanks!! submitted by /u/egad_a_mouse [link] [comments]  ( 2 min )
    How to create a layer without inputs in tensorflow.
    In deep rl algorithm like PPO, a continuous stochastic policy is represented by Normal Distribution. For this the recommended way of creating a Normal Distribution is to get the mean by passing the state through NN and then using a state independent layer to predict log_std. This layer which predicts log_std should be trainable using backprop just like biases. So how to create this layer in tensorflow 2. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
  • Open

    Machine Learning-based Anomaly Detection in Optical Fiber Monitoring. (arXiv:2204.07059v1 [cs.NI])
    Secure and reliable data communication in optical networks is critical for high-speed Internet. However, optical fibers, serving as the data transmission medium providing connectivity to billons of users worldwide, are prone to a variety of anomalies resulting from hard failures (e.g., fiber cuts) and malicious physical attacks (e.g., optical eavesdropping (fiber tapping)) etc. Such anomalies may cause network disruption and thereby inducing huge financial and data losses, or compromise the confidentiality of optical networks by gaining unauthorized access to the carried data, or gradually degrade the network operations. Therefore, it is highly required to implement efficient anomaly detection, diagnosis, and localization schemes for enhancing the availability and reliability of optical networks. In this paper, we propose a data driven approach to accurately and quickly detect, diagnose, and localize fiber anomalies including fiber cuts, and optical eavesdropping attacks. The proposed method combines an autoencoder-based anomaly detection and an attention-based bidirectional gated recurrent unit algorithm, whereby the former is used for fault detection and the latter is adopted for fault diagnosis and localization once an anomaly is detected by the autoencoder. We verify the efficiency of our proposed approach by experiments under various anomaly scenarios using real operational data. The experimental results demonstrate that: (i) the autoencoder detects any fiber fault or anomaly with an F1 score of 96.86%; and (ii) the attention-based bidirectional gated recurrent unit algorithm identifies the the detected anomalies with an average accuracy of 98.2%, and localizes the faults with an average root mean square error of 0.19 m.  ( 2 min )
    Robust No-Regret Learning in Min-Max Stackelberg Games. (arXiv:2203.14126v2 [cs.GT] UPDATED)
    The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.  ( 2 min )
    Open-Set Recognition: a Good Closed-Set Classifier is All You Need?. (arXiv:2110.06207v2 [cs.CV] CROSS LISTED)
    The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We find that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale ImageNet evaluation. Second, we use this correlation to boost the performance of a maximum logit score OSR 'baseline' by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks. Similarly, we boost the performance of the existing state-of-the-art method by improving its closed-set accuracy, but the resulting discrepancy with the strong baseline is marginal. Our third contribution is to present the 'Semantic Shift Benchmark' (SSB), which better respects the task of detecting semantic novelty, in contrast to other forms of distribution shift also considered in related sub-fields, such as out-of-distribution detection. On this new evaluation, we again demonstrate that there is negligible difference between the strong baseline and the existing state-of-the-art. Project Page: https://www.robots.ox.ac.uk/~vgg/research/osr/  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. (arXiv:2204.06806v1 [cs.CV])
    We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox  ( 2 min )
    MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners. (arXiv:2204.06833v1 [cs.RO])
    For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics.  ( 2 min )
    Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations. (arXiv:2202.07800v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA. Examples include that tokens containing semantically meaningless or distractive image backgrounds do not positively contribute to the ViT predictions. In this work, we propose to reorganize image tokens during the feed-forward process of ViT models, which is integrated into ViT during training. For each forward inference, we identify the attentive image tokens between MHSA and FFN (i.e., feed-forward network) modules, which is guided by the corresponding class token attention. Then, we reorganize image tokens by preserving attentive image tokens and fusing inattentive ones to expedite subsequent MHSA and FFN computations. To this end, our method EViT improves ViTs from two perspectives. First, under the same amount of input image tokens, our method reduces MHSA and FFN computation for efficient inference. For instance, the inference speed of DeiT-S is increased by 50% while its recognition accuracy is decreased by only 0.3% for ImageNet classification. Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images. An example is that we improve the recognition accuracy of DeiT-S by 1% for ImageNet classification at the same computational cost of a vanilla DeiT-S. Meanwhile, our method does not introduce more parameters to ViTs. Experiments on the standard benchmarks show the effectiveness of our method. The code is available at https://github.com/youweiliang/evit  ( 2 min )
    Learning Task-Aware Energy Disaggregation: a Federated Approach. (arXiv:2204.06767v1 [cs.LG])
    We consider the problem of learning the energy disaggregation signals for residential load data. Such task is referred as non-intrusive load monitoring (NILM), and in order to find individual devices' power consumption profiles based on aggregated meter measurements, a machine learning model is usually trained based on large amount of training data coming from a number of residential homes. Yet collecting such residential load datasets require both huge efforts and customers' approval on sharing metering data, while load data coming from different regions or electricity users may exhibit heterogeneous usage patterns. Both practical concerns make training a single, centralized NILM model challenging. In this paper, we propose a decentralized and task-adaptive learning scheme for NILM tasks, where nested meta learning and federated learning steps are designed for learning task-specific models collectively. Simulation results on benchmark dataset validate proposed algorithm's performance on efficiently inferring appliance-level consumption for a variety of homes and appliances.  ( 2 min )
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    GM-TOuNN: Graded Multiscale Topology Optimization using Neural Networks. (arXiv:2204.06682v1 [cs.CE])
    Multiscale topology optimization (M-TO) entails generating an optimal global topology, and an optimal set of microstructures at a smaller scale, for a physics-constrained problem. With the advent of additive manufacturing, M-TO has gained significant prominence. However, generating optimal microstructures at various locations can be computationally very expensive. As an alternate, graded multiscale topology optimization (GM-TO) has been proposed where one or more pre-selected and graded (parameterized) microstructural topologies are used to fill the domain optimally. This leads to a significant reduction in computation while retaining many of the benefits of M-TO. A successful GM-TO framework must: (1) be capable of efficiently handling numerous pre-selected microstructures, (2) be able to continuously switch between these microstructures during optimization, (3) ensure that the partition of unity is satisfied, and (4) discourage microstructure mixing at termination. In this paper, we propose to meet these requirements by exploiting the unique classification capacity of neural networks. Specifically, we propose a graded multiscale topology optimization using neural-network (GM-TOuNN) framework with the following features: (1) the number of design variables is only weakly dependent on the number of pre-selected microstructures, (2) it guarantees partition of unity while discouraging microstructure mixing, and (3) it supports automatic differentiation, thereby eliminating manual sensitivity analysis. The proposed framework is illustrated through several examples.  ( 2 min )
    Medical Application of Geometric Deep Learning for the Diagnosis of Glaucoma. (arXiv:2204.07004v1 [eess.IV])
    Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D raster scans of the ONH were acquired with Spectralis OCT for 477 glaucoma and 2,296 non-glaucoma subjects at the Singapore National Eye Centre. All volumes were automatically segmented using deep learning to identify 7 major neural and connective tissues including the RNFL, the prelamina, and the lamina cribrosa (LC). Each ONH was then represented as a 3D point cloud with 1,000 points chosen randomly from all tissue boundaries. To simplify the problem, all ONH point clouds were aligned with respect to the plane and center of Bruch's membrane opening. Geometric deep learning (PointNet) was then used to provide a glaucoma diagnosis from a single OCT point cloud. The performance of our approach was compared to that obtained with a 3D CNN, and with RNFL thickness. Results: PointNet was able to provide a robust glaucoma diagnosis solely from the ONH represented as a 3D point cloud (AUC=95%). The performance of PointNet was superior to that obtained with a standard 3D CNN (AUC=87%) and with that obtained from RNFL thickness alone (AUC=80%). Discussion: We provide a proof-of-principle for the application of geometric deep learning in the field of glaucoma. Our technique requires significantly less information as input to perform better than a 3D CNN, and with an AUC superior to that obtained from RNFL thickness alone. Geometric deep learning may have wide applicability in the field of Ophthalmology.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words. (arXiv:2204.04833v2 [cs.CL] UPDATED)
    Word embedding systems such as Word2Vec and GloVe are well-known in deep learning approaches to NLP. This is largely due to their ability to capture semantic relationships between words. In this work we investigated their usefulness in capturing rhythmic similarity of words instead. The results show that vectors these embeddings assign to rhyming words are more similar to each other, compared to the other words. It is also revealed that GloVe performs relatively better than Word2Vec in this regard. We also proposed a first of its kind metric for quantifying rhythmic similarity of a pair of words.  ( 2 min )
    BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks. (arXiv:2204.07054v1 [q-bio.NC])
    Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their established performance in other fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by 1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and 2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we also host the BrainGB website at https:// brainnet.us/ with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.  ( 2 min )
    Accelerated Policy Learning with Parallel Differentiable Simulation. (arXiv:2204.07137v1 [cs.LG])
    Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Latent Aspect Detection from Online Unsolicited Customer Reviews. (arXiv:2204.06964v1 [cs.CL])
    Within the context of review analytics, aspects are the features of products and services at which customers target their opinions and sentiments. Aspect detection helps product owners and service providers to identify shortcomings and prioritize customers' needs, and hence, maintain revenues and mitigate customer churn. Existing methods focus on detecting the surface form of an aspect by training supervised learning methods that fall short when aspects are latent in reviews. In this paper, we propose an unsupervised method to extract latent occurrences of aspects. Specifically, we assume that a customer undergoes a two-stage hypothetical generative process when writing a review: (1) deciding on an aspect amongst the set of aspects available for the product or service, and (2) writing the opinion words that are more interrelated to the chosen aspect from the set of all words available in a language. We employ latent Dirichlet allocation to learn the latent aspects distributions for generating the reviews. Experimental results on benchmark datasets show that our proposed method is able to improve the state of the art when the aspects are latent with no surface form in reviews.  ( 2 min )
    Neighborhood Attention Transformer. (arXiv:2204.07143v1 [cs.CV])
    We present Neighborhood Attention Transformer (NAT), an efficient, accurate and scalable hierarchical transformer that works well on both image classification and downstream vision tasks. It is built upon Neighborhood Attention (NA), a simple and flexible attention mechanism that localizes the receptive field for each query to its nearest neighboring pixels. NA is a localization of self-attention, and approaches it as the receptive field size increases. It is also equivalent in FLOPs and memory usage to Swin Transformer's shifted window attention given the same receptive field size, while being less constrained. Furthermore, NA includes local inductive biases, which eliminate the need for extra operations such as pixel shifts. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet with only 4.3 GFLOPs and 28M parameters, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20k. We will open-source our checkpoints, training script, configurations, and our CUDA kernel at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .  ( 2 min )
    Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar. (arXiv:2204.06643v1 [cs.LG])
    We introduce NSEdit (neural-symbolic edit), a novel Transformer-based code repair method. Given only the source code that contains bugs, NSEdit predicts an editing sequence that can fix the bugs. The edit grammar is formulated as a regular language, and the Transformer uses it as a neural-symbolic scripting interface to generate editing programs. We modify the Transformer and add a pointer network to select the edit locations. An ensemble of rerankers are trained to re-rank the editing sequences generated by beam search. We fine-tune the rerankers on the validation set to reduce over-fitting. NSEdit is evaluated on various code repair datasets and achieved a new state-of-the-art accuracy ($24.04\%$) on the Tufano small dataset of the CodeXGLUE benchmark. NSEdit performs robustly when programs vary from packages to packages and when buggy programs are concrete. We conduct detailed analysis on our methods and demonstrate the effectiveness of each component.  ( 2 min )
    Reflective Fiber Faults Detection and Characterization Using Long-Short-Term Memory. (arXiv:2204.07058v1 [cs.NI])
    To reduce operation-and-maintenance expenses (OPEX) and to ensure optical network survivability, optical network operators need to detect and diagnose faults in a timely manner and with high accuracy. With the rapid advancement of telemetry technology and data analysis techniques, data-driven approaches leveraging telemetry data to tackle the fault diagnosis problem have been gaining popularity due to their quick implementation and deployment. In this paper, we propose a novel multi-task learning model based on long short-term memory (LSTM) to detect, locate, and estimate the reflectance of fiber reflective faults (events) including the connectors and the mechanical splices by extracting insights from monitored data obtained by the optical time domain reflectometry (OTDR) principle commonly used for troubleshooting of fiber optic cables or links. The experimental results prove that the proposed method: (i) achieves a good detection capability and high localization accuracy within short measurement time even for low SNR values; and (ii) outperforms conventionally employed techniques.
    The Power of Linear Recurrent Neural Networks. (arXiv:1802.03308v6 [cs.LG] UPDATED)
    Recurrent neural networks are a powerful means to cope with time series. We show how linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the size of an LRNN can be reduced significantly in one step, after inspecting the eigenvalues of the network transition matrix, by taking only the most relevant components. Therefore, in contrast to others, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.  ( 2 min )
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.  ( 2 min )
    Interpretability of Machine Learning Methods Applied to Neuroimaging. (arXiv:2204.07005v1 [cs.CV])
    Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant.  ( 2 min )
    The MIT Supercloud Workload Classification Challenge. (arXiv:2204.05839v2 [cs.DC] UPDATED)
    High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.  ( 2 min )
    Learning and controlling the source-filter representation of speech with a variational autoencoder. (arXiv:2204.07075v1 [cs.SD])
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that $f_0$ and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and independently control these speech factors of variation within the learned latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation of speech signals.  ( 2 min )
    Solving AC Power Flow with Graph Neural Networks under Realistic Constraints. (arXiv:2204.07000v1 [cs.LG])
    In this paper we propose a graph neural network architecture solving the AC power flow problem under realistic constraints. While the energy transition is changing the energy industry to a digitalized and decentralized energy system, the challenges are increasingly shifting to the distribution grid level to integrate new loads and generation technologies. To ensure a save and resilient operation of distribution grids, AC power flow calculations are the means of choice to determine grid operating limits or analyze grid asset utilization in planning procedures. In our approach we demonstrate the development of a framework which makes use of graph neural networks to learn the physical constraints of the power flow. We present our model architecture on which we perform unsupervised training to learn a general solution of the AC power flow formulation that is independent of the specific topologies and supply tasks used for training. Finally, we demonstrate, validate and discuss our results on medium voltage benchmark grids.  ( 2 min )
    Generative power of a protein language model trained on multiple sequence alignments. (arXiv:2204.07110v1 [q-bio.BM])
    Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly uses the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences generally score better than those generated by Potts models, and even than natural sequences, for homology, coevolution and structure-based measures. Moreover, MSA Transformer better reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models, although Potts models better reproduce first- and second-order statistics. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
    Matrix Completion with Heterogonous Cost. (arXiv:2203.12120v2 [cs.LG] UPDATED)
    The matrix completion problem has been studied broadly under many underlying conditions. The problem has been explored under adaptive or non-adaptive, exact or estimation, single-phase or multi-phase, and many other categories. In most of these cases, the observation cost of each entry is uniform and has the same cost across the columns. However, in many real-life scenarios, we could expect elements from distinct columns or distinct positions to have a different cost. In this paper, we explore this generalization under adaptive conditions. We approach the problem under two different cost models. The first one is that entries from different columns have different observation costs, but, within the same column, each entry has a uniform cost. The second one is any two entry has different observation cost, despite being the same or different columns. We provide complexity analysis of our algorithms and provide tightness guarantees.
    Semi-Supervised Convolutive NMF for Automatic Piano Transcription. (arXiv:2202.04989v2 [cs.SD] UPDATED)
    Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
    HCFL: A High Compression Approach for Communication-Efficient Federated Learning in Very Large Scale IoT Networks. (arXiv:2204.06760v1 [cs.LG])
    Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. In this work, we develop a novel compression scheme for FL, called high-compression federated learning (HCFL), for very large scale IoT networks. HCFL can reduce the data load for FL processes without changing their structure and hyperparameters. In this way, we not only can significantly reduce communication costs, but also make intensive learning processes more adaptable on low-computing resource IoT devices. Furthermore, we investigate a relationship between the number of IoT devices and the convergence level of the FL model and thereby better assess the quality of the FL process. We demonstrate our HCFL scheme in both simulations and mathematical analyses. Our proposed theoretical research can be used as a minimum level of satisfaction, proving that the FL process can achieve good performance when a determined configuration is met. Therefore, we show that HCFL is applicable in any FL-integrated networks with numerous IoT devices.
    Reinforcement Learning Policy Recommendation for Interbank Network Stability. (arXiv:2204.07134v1 [econ.GN])
    In this paper we analyze the effect of a policy recommendation on the performances of an artificial interbank market. Financial institutions stipulate lending agreements following a public recommendation and their individual information. The former, modeled by a reinforcement learning optimal policy trying to maximize the long term fitness of the system, gathers information on the economic environment and directs economic actors to create credit relationships based on the optimal choice between a low interest rate or high liquidity supply. The latter, based on the agents' balance sheet, allows to determine the liquidity supply and interest rate that the banks optimally offer on the market. Based on the combination between the public and the private signal, financial institutions create or cut their credit connections over time via a preferential attachment evolving procedure able to generate a dynamic network. Our results show that the emergence of a core-periphery interbank network, combined with a certain level of homogeneity on the size of lenders and borrowers, are essential features to ensure the resilience of the system. Moreover, the reinforcement learning optimal policy recommendation plays a crucial role in mitigating systemic risk with respect to alternative policy instruments.
    Constrained Deep One-Class Feature Learning For Classifying Imbalanced Medical Images. (arXiv:2111.10610v2 [eess.IV] UPDATED)
    Medical image data are usually imbalanced across different classes. One-class classification has attracted increasing attention to address the data imbalance problem by distinguishing the samples of the minority class from the majority class. Previous methods generally aim to either learn a new feature space to map training samples together or to fit training samples by autoencoder-like models. These methods mainly focus on capturing either compact or descriptive features, where the information of the samples of a given one class is not sufficiently utilized. In this paper, we propose a novel deep learning-based method to learn compact features by adding constraints on the bottleneck features, and to preserve descriptive features by training an autoencoder at the same time. Through jointly optimizing the constraining loss and the autoencoder's reconstruction loss, our method can learn more relevant features associated with the given class, making the majority and minority samples more distinguishable. Experimental results on three clinical datasets (including the MRI breast images, FFDM breast images and chest X-ray images) obtains state-of-art performance compared to previous methods.
    Supplementation of deep neural networks with simplified physics-based features to increase model prediction accuracy. (arXiv:2204.06764v1 [cs.ET])
    To improve predictive models for STEM applications, supplemental physics-based features computed from input parameters are introduced into single and multiple layers of a deep neural network (DNN). While many studies focus on informing DNNs with physics through differential equations or numerical simulation, much may be gained through integration of simplified relationships. To evaluate this hypothesis, a number of thin rectangular plates simply-supported on all edges are simulated for five materials. With plate dimensions and material properties as input features and fundamental natural frequency as the sole output, predictive performance of a purely data-driven DNN-based model is compared with models using additional inputs computed from simplified physical relationships among baseline parameters, namely plate weight, modulus of rigidity, and shear modulus. To better understand the benefit to model accuracy, these additional features are injected into various single and multiple DNN layers, and trained with four different dataset sizes. When these physics-enhanced models are evaluated against independent data of the same materials and similar dimensions to the training sets, supplementation with simplified physics-based parameters provides little reduction in prediction error over the baseline for models trained with dataset sizes of 60 and greater, although small improvement from 19.3% to 16.1% occurs when trained with a sparse size of 30. Conversely, notable accuracy gains occur when the independent test data is of material and dimensions not conforming to the training set. Specifically, when physics-enhanced data is injected into multiple DNN layers, reductions in error from 33.2% to 19.6%, 34.9% to 19.9%, 35.8% to 22.4%, and 43.0% to 28.4% are achieved for training dataset sizes of 261, 117, 60, and 30, respectively, demonstrating attainment of a degree of generalizability.
    ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes. (arXiv:2201.07788v2 [cs.CV] UPDATED)
    Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.
    Time Series of Non-Additive Metrics: Identification and Interpretation of Contributing Factors of Variance by Linear Decomposition. (arXiv:2204.06688v1 [cs.LG])
    The research paper addresses linear decomposition of time series of non-additive metrics that allows for the identification and interpretation of contributing factors (input features) of variance. Non-additive metrics, such as ratios, are widely used in a variety of domains. It commonly requires preceding aggregations of underlying variables that are used to calculate the metric of interest. The latest poses a dimensionality challenge when the input features and underlying variables are formed as two-dimensional arrays along elements, such as account or customer identifications, and time points. It rules out direct modeling of the time series of a non-additive metric as a function of input features. The article discusses a five-step approach: (1) segmentations of input features and the underlying variables of the metric that are supported by unsupervised autoencoders, (2) univariate or joint fittings of the metric by the aggregated input features on the segmented domains, (3) transformations of pre-screened input features according to the fitted models, (4) aggregation of the transformed features as time series, and (5) modelling of the metric time series as a sum of constrained linear effects of the aggregated features. Alternatively, approximation by numerical differentiation has been considered to linearize the metric. It allows for element level univariate or joint modeling of step (2). The process of these analytical steps allows for a backward-looking explanatory decomposition of the metric as a sum of time series of the survived input features. The paper includes a synthetic example that studies loss-to-balance monthly rates of a hypothetical retail credit portfolio. To validate that no latent factors other than the survived input features have significant impacts on the metric, Statistical Process Control has been introduced for the residual time series.
    End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks. (arXiv:2204.01681v2 [physics.ins-det] UPDATED)
    We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of ${\cal O}(1000)$ particles in high-luminosity conditions with 200 pileup.
    Machine Learning State-of-the-Art with Uncertainties. (arXiv:2204.05173v2 [cs.LG] UPDATED)
    With the availability of data, hardware, software ecosystem and relevant skill sets, the machine learning community is undergoing a rapid development with new architectures and approaches appearing at high frequency every year. In this article, we conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results as well as impact the reviewing process. In addition, we explore the hallmarks and limitations of this approximation. We discuss the relevance of this approach reflecting on a spotlight publication of ICLR22. A reproducible workflow is made available as an open-source adjoint to this publication. Based on our discussion, we make suggestions for improving the authoring and reviewing process of machine learning articles.
    Characterizing the Fundamental Trade-offs in Learning Invariant Representations. (arXiv:2109.03386v2 [cs.LG] UPDATED)
    Many applications of representation learning, such as privacy-preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. Solutions to such problems lead to trade-offs between the two objectives when they are competing with each other. While existing works study bounds on these trade-offs, three questions still remain outstanding: \emph{What are the exact fundamental trade-offs between utility and invariance?}, 2) \emph{What is the optimal dimensionality of the representation?}, and 3) \emph{What are the encoders (mapping data to a representation) that achieve the exact fundamental trade-offs and how can we estimate them from data?} This paper addresses these questions. We adopt a functional analysis perspective and derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs, optimal representation dimensionality, and the corresponding encoders. We also numerically quantify the trade-offs on representative problems and compare them to those achieved by baseline invariant representation learning algorithms.
    StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2. (arXiv:2112.14683v2 [cs.CV] UPDATED)
    Videos show continuous events, yet most $-$ if not all $-$ video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be $-$ time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames' features. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024$^2$ videos for the first time. We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model is tested on four modern 256$^2$ and one 1024$^2$-resolution video synthesis benchmarks. In terms of sheer metrics, it performs on average ${\approx}30\%$ better than the closest runner-up. Project website: https://universome.github.io.
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
    Epileptic Seizure Risk Assessment by Multi-Channel Imaging of the EEG. (arXiv:2204.07034v1 [eess.SP])
    Refractory epileptic patients can suffer a seizure at any moment. Seizure prediction would substantially improve their lives. In this work, based on scalp EEG and its transformation into images, the likelihood of an epileptic seizure occurring at any moment is computed using an average of the softmax layer output (the likelihood) of a CNN, instead of the output of the classification layer. Results show that by analyzing the likelihood and thresholding it, prediction has higher sensitivity or a lower FPR/h. The best threshold for the likelihood was higher than 50% for 5 patients, and was lower for the remaining 36. However, more testing is needed, especially in new seizures, to better assess the real performance of this method. This work is a proof of concept with a positive outlook.
    Learning Spectral Unions of Partial Deformable 3D Shapes. (arXiv:2104.00514v2 [cs.GR] UPDATED)
    Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging problem of recovering the geometry from the spectral information of partial shapes. In this paper, we propose a possible way to fill this gap. We introduce a learning-based method to estimate the Laplacian spectrum of the union of partial non-rigid 3D shapes, without actually computing the 3D geometry of the union or any correspondence between those partial shapes. We do so by operating purely in the spectral domain and by defining the union operation between short sequences of eigenvalues. We show that the approximated union spectrum can be used as-is to reconstruct the complete geometry [MRC*19], perform region localization on a template [RTO*19] and retrieve shapes from a database, generalizing ShapeDNA [RWP06] to work with partialities. Working with eigenvalues allows us to deal with unknown correspondence, different sampling, and different discretizations (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies or horses), as well as to partiality artifacts not seen at training time.
    A Neural Network based Framework for Effective Laparoscopic Video Quality Assessment. (arXiv:2202.04517v2 [eess.IV] UPDATED)
    Video quality assessment is a challenging problem having a critical significance in the context of medical imaging. For instance, in laparoscopic surgery, the acquired video data suffers from different kinds of distortion that not only hinder surgery performance but also affect the execution of subsequent tasks in surgical navigation and robotic surgeries. For this reason, we propose in this paper neural network-based approaches for distortion classification as well as quality prediction. More precisely, a Residual Network (ResNet) based approach is firstly developed for simultaneous ranking and classification task. Then, this architecture is extended to make it appropriate for the quality prediction task by using an additional Fully Connected Neural Network (FCNN). To train the overall architecture (ResNet and FCNN models), transfer learning and end-to-end learning approaches are investigated. Experimental results, carried out on a new laparoscopic video quality database, have shown the efficiency of the proposed methods compared to recent conventional and deep learning based approaches.
    SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework. (arXiv:2204.07072v1 [cs.CV])
    Multi-animal pose estimation is essential for studying animals' social behaviors in neuroscience and neuroethology. Advanced approaches have been proposed to support multi-animal estimation and achieve state-of-the-art performance. However, these models rarely exploit unlabeled data during training even though real world applications have exponentially more unlabeled frames than labeled frames. Manually adding dense annotations for a large number of images or videos is costly and labor-intensive, especially for multiple instances. Given these deficiencies, we propose a novel semi-supervised architecture for multi-animal pose estimation, leveraging the abundant structures pervasive in unlabeled frames in behavior videos to enhance training, which is critical for sparsely-labeled problems. The resulting algorithm will provide superior multi-animal pose estimation results on three animal experiments compared to the state-of-the-art baseline and exhibits more predictive power in sparsely-labeled data regimes.
    Transformers and the representation of biomedical background knowledge. (arXiv:2202.02432v2 [cs.CL] UPDATED)
    BioBERT and BioMegatron are Transformers models adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.
    Q-TART: Quickly Training for Adversarial Robustness and in-Transferability. (arXiv:2204.07024v1 [cs.CV])
    Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important. We propose to simultaneously tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART, Quickly Train for Adversarial Robustness and in-Transferability. Q-TART follows the intuition that samples highly susceptible to noise strongly affect the decision boundaries learned by DNNs, which in turn degrades their performance and adversarial susceptibility. By identifying and removing such samples, we demonstrate improved performance and adversarial robustness while using only a subset of the training data. Through our experiments we highlight Q-TART's high performance across multiple Dataset-DNN combinations, including ImageNet, and provide insights into the complementary behavior of Q-TART alongside existing adversarial training approaches to increase robustness by over 1.3% while using up to 17.9% less training time.
    HASA: Hybrid Architecture Search with Aggregation Strategy for Echinococcosis Classification and Ovary Segmentation in Ultrasound Images. (arXiv:2204.06697v1 [cs.CV])
    Different from handcrafted features, deep neural networks can automatically learn task-specific features from data. Due to this data-driven nature, they have achieved remarkable success in various areas. However, manual design and selection of suitable network architectures are time-consuming and require substantial effort of human experts. To address this problem, researchers have proposed neural architecture search (NAS) algorithms which can automatically generate network architectures but suffer from heavy computational cost and instability if searching from scratch. In this paper, we propose a hybrid NAS framework for ultrasound (US) image classification and segmentation. The hybrid framework consists of a pre-trained backbone and several searched cells (i.e., network building blocks), which takes advantage of the strengths of both NAS and the expert knowledge from existing convolutional neural networks. Specifically, two effective and lightweight operations, a mixed depth-wise convolution operator and a squeeze-and-excitation block, are introduced into the candidate operations to enhance the variety and capacity of the searched cells. These two operations not only decrease model parameters but also boost network performance. Moreover, we propose a re-aggregation strategy for the searched cells, aiming to further improve the performance for different vision tasks. We tested our method on two large US image datasets, including a 9-class echinococcosis dataset containing 9566 images for classification and an ovary dataset containing 3204 images for segmentation. Ablation experiments and comparison with other handcrafted or automatically searched architectures demonstrate that our method can generate more powerful and lightweight models for the above US image classification and segmentation tasks.
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.
    Geometric Deep Learning to Identify the Critical 3D Structural Features of the Optic Nerve Head for Glaucoma Diagnosis. (arXiv:2204.06931v1 [eess.IV])
    Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) To compare the performance of two relatively recent geometric deep learning techniques in diagnosing glaucoma from a single OCT scan of the ONH; and (2) To identify the 3D structural features of the ONH that are critical for the diagnosis of glaucoma. Methods: In this study, we included a total of 2,247 non-glaucoma and 2,259 glaucoma scans from 1,725 subjects. All subjects had their ONHs imaged in 3D with Spectralis OCT. All OCT scans were automatically segmented using deep learning to identify major neural and connective tissues. Each ONH was then represented as a 3D point cloud. We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from such 3D ONH point clouds and to identify the critical 3D structural features of the ONH for glaucoma diagnosis. Results: Both the DGCNN (AUC: 0.97$\pm$0.01) and PointNet (AUC: 0.95$\pm$0.02) were able to accurately detect glaucoma from 3D ONH point clouds. The critical points formed an hourglass pattern with most of them located in the inferior and superior quadrant of the ONH. Discussion: The diagnostic accuracy of both geometric deep learning approaches was excellent. Moreover, we were able to identify the critical 3D structural features of the ONH for glaucoma diagnosis that tremendously improved the transparency and interpretability of our method. Consequently, our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders.
    Integration of neural network and fuzzy logic decision making compared with bilayered neural network in the simulation of daily dew point temperature. (arXiv:2202.12256v2 [cs.LG] UPDATED)
    In this research, dew point temperature (DPT) is simulated using the data-driven approach. Adaptive Neuro-Fuzzy Inference System (ANFIS) is utilized as a data-driven technique to forecast this parameter at Tabriz in East Azerbaijan. Various input patterns, namely T min, T max, and T mean, are utilized for training the architecture whilst DPT is the model's output. The findings indicate that, in general, ANFIS method is capable of identifying data patterns with a high degree of accuracy. However, the approach demonstrates that processing time and computer resources may substantially increase by adding additional functions. Based on the results, the number of iterations and computing resources might change dramatically if new functionalities are included. As a result, tuning parameters have to be optimized inside the method framework. The findings demonstrate a high agreement between results by the data-driven technique (machine learning method) and the observed data. Using this prediction toolkit, DPT can be adequately forecasted solely based on the temperature distribution of Tabriz. This kind of modeling is extremely promising for predicting DPT at various sites. Besides, this study thoroughly compares the Bilayered Neural Network (BNN) and ANFIS models on various scales. Whilst the ANFIS model is extremely stable for almost all numbers of membership functions, the BNN model is highly sensitive to this scale factor to predict DPT.
    Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning. (arXiv:2204.06748v1 [cs.CL])
    Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
    A Collection of Deep Learning-based Feature-Free Approaches for Characterizing Single-Objective Continuous Fitness Landscapes. (arXiv:2204.05752v2 [cs.LG] UPDATED)
    Exploratory Landscape Analysis is a powerful technique for numerically characterizing landscapes of single-objective continuous optimization problems. Landscape insights are crucial both for problem understanding as well as for assessing benchmark set diversity and composition. Despite the irrefutable usefulness of these features, they suffer from their own ailments and downsides. Hence, in this work we provide a collection of different approaches to characterize optimization landscapes. Similar to conventional landscape features, we require a small initial sample. However, instead of computing features based on that sample, we develop alternative representations of the original sample. These range from point clouds to 2D images and, therefore, are entirely feature-free. We demonstrate and validate our devised methods on the BBOB testbed and predict, with the help of Deep Learning, the high-level, expert-based landscape properties such as the degree of multimodality and the existence of funnel structures. The quality of our approaches is on par with methods relying on the traditional landscape features. Thereby, we provide an exciting new perspective on every research area which utilizes problem information such as problem understanding and algorithm design as well as automated algorithm configuration and selection.
    Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022. (arXiv:2204.01142v2 [math.AT] UPDATED)
    Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.
    ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision. (arXiv:2204.06863v1 [cs.LG])
    A way to overcome expensive and time-consuming manual data labeling is weak supervision - automatic annotation of data samples via a predefined set of labeling functions (LFs), rule-based mechanisms that generate potentially erroneous labels. In this work, we investigate noise reduction techniques for weak supervision based on the principle of k-fold cross-validation. In particular, we extend two frameworks for detecting the erroneous samples in manually annotated data to the weakly supervised setting. Our methods profit from leveraging the information about matching LFs and detect noisy samples more accurately. We also introduce a new algorithm for denoising the weakly annotated data called ULF, that refines the allocation of LFs to classes by estimating the reliable LFs-to-classes joint matrix. Evaluation on several datasets shows that ULF successfully improves weakly supervised learning without using any manually labeled data.
    A Melody-Unsupervision Model for Singing Voice Synthesis. (arXiv:2110.06546v2 [eess.AS] UPDATED)
    Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
    Unsupervised Temporal Learning on Monocular Videos for 3D Human Pose Estimation. (arXiv:2012.01511v3 [cs.CV] UPDATED)
    In this paper we propose an unsupervised learning method to extract temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying CSS only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features into the time-variant component, well-suited for human pose estimation. Our approach reduces error by about 50\% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis. (arXiv:2112.05745v3 [eess.SY] UPDATED)
    In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of dynamical systems. By sampling inputs, evaluating their images in the true reachable set, and taking their $\epsilon$-padded convex hull as a set estimator, this algorithm applies to general problem settings and is simple to implement. Our main contribution is the derivation of asymptotic and finite-sample accuracy guarantees using random set theory. This analysis informs algorithmic design to obtain an $\epsilon$-close reachable set approximation with high probability, provides insights into which reachability problems are most challenging, and motivates safety-critical applications of the technique. On a neural network verification task, we show that this approach is more accurate and significantly faster than prior work. Informed by our analysis, we also design a robust model predictive controller that we demonstrate in hardware experiments.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?. (arXiv:2202.05821v2 [cs.LG] UPDATED)
    This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities, among video, kinematic, and segmentation data, in order to study their added value. The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.
    Adversarial Parameter Defense by Multi-Step Risk Minimization. (arXiv:2109.02889v2 [cs.LG] UPDATED)
    Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.
    The Pseudo Projection Operator: Applications of Deep Learning to Projection Based Filtering in Non-Trivial Frequency Regimes. (arXiv:2111.07140v3 [eess.SP] UPDATED)
    Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present. However, this technique relies on a priori knowledge of what frequencies contain signal and noise and that these frequencies do not overlap, which is difficult to achieve in practice. To address these issues, we introduce a PO-neural network hybrid model, the Pseudo Projection Operator (PPO), which leverages a neural network to perform frequency selection. We compare the filtering capabilities of a PPO, PO, and denoising autoencoder (DAE) on the University of Rochester Multi-Modal Music Performance Dataset with a variety of added noise types. In the majority of experiments, the PPO outperforms both the PO and DAE. Based upon these results, we suggest future application of the PPO to filtering problems in the physical and biological sciences.
    Planting Undetectable Backdoors in Machine Learning Models. (arXiv:2204.06974v1 [cs.LG])
    Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
    Your fairness may vary: Pretrained language model fairness in toxic text classification. (arXiv:2108.01250v3 [cs.CL] UPDATED)
    The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models. Warning: This paper contains samples of offensive text.
    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning. (arXiv:2106.06047v2 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.
    LDPC codes: tracking non-stationary channel noise using sequential variational Bayesian estimates. (arXiv:2204.07037v1 [eess.SP])
    We present a sequential Bayesian learning method for tracking non-stationary signal-to-noise ratios in LDPC codes using probabilistic graphical models. We represent the LDPC code as a cluster graph using a general purpose cluster graph construction algorithm called the layered trees running intersection property (LTRIP) algorithm. The channel noise estimator is a global Gamma cluster, which we extend to allow for Bayesian tracking of non-stationary noise variation. We evaluate our proposed model on real-world 5G drive test data. Our results show that our model is capable of tracking non-stationary channel noise, which outperforms an LDPC code with a fixed knowledge of the actual average channel noise.
    Exploring Dual Encoder Architectures for Question Answering. (arXiv:2204.07120v1 [cs.CL])
    Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore the dual encoder architectures for QA retrieval tasks. By evaluating on MS MARCO and the MultiReQA benchmark, we show that SDE performs significantly better than ADE. We further propose three different improved versions of ADEs. Based on the evaluation of QA retrieval tasks and direct analysis of the embeddings, we demonstrate that sharing parameters in projection layers would enable ADEs to perform competitively with SDEs.
    Character-focused Video Thumbnail Retrieval. (arXiv:2204.06563v1 [cs.CV])
    We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.
    CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing. (arXiv:2204.06625v1 [cs.CL])
    Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.
    A Unified Analysis of Dynamic Interactive Learning. (arXiv:2204.07071v1 [cs.LG])
    In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the models analyzed by [Emamjomeh-Zadeh et al., 2020], which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Sketching Algorithms and Lower Bounds for Ridge Regression. (arXiv:2204.06653v1 [cs.DS])
    We give a sketching-based iterative algorithm that computes $1+\varepsilon$ approximate solutions for the ridge regression problem $\min_x \|{Ax-b}\|_2^2 +\lambda\|{x}\|_2^2$ where $A \in \mathbb{R}^{n \times d}$ with $d \ge n$. Our algorithm, for a constant number of iterations (requiring a constant number of passes over the input), improves upon earlier work of Chowdhury et al., by requiring that the sketching matrix only has a weaker Approximate Matrix Multiplication (AMM) guarantee that depends on $\epsilon$, along with a constant subspace embedding guarantee. The earlier work instead requires that the sketching matrix have a subspace embedding guarantee that depends on $\epsilon$. For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al., with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \|{A}\|_2$ is the spectral norm of the matrix $A$. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression. Finally, we show that the sketch size required for our algorithm is essentially optimal for a natural framework of algorithms for ridge regression by proving lower bounds on oblivious sketching matrices for AMM. The sketch size lower bounds for AMM may be of independent interest.
    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification. (arXiv:2204.07030v1 [cs.CV])
    Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire continental United States, providing macroscopic insight into the effects of geography on crop classification in multi-spectral and temporally distributed satellite imagery. Our method demonstrates improved generalisability from 1) passing geographically correlated climate variables along with the satellite data to a Transformer model and 2) regressing on the model features to reconstruct these domain variables. Combined, we provide a novel perspective on geographic generalisation in satellite imagery and a simple-yet-effective approach to leverage domain knowledge. Code is available at: \url{https://github.com/samar-khanna/cropmap}
    Any-resolution Training for High-resolution Image Synthesis. (arXiv:2204.07156v1 [cs.CV])
    Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away, and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. Taking advantage of this data is challenging; high-resolution processing is costly, and current architectures can only process fixed-resolution data. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolutions images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, which also allows for scalable training at higher resolutions. Controlled FFHQ experiments show our method takes advantage of the multi-resolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments. Our project page is available at: https://chail.github.io/anyres-gan/.
    Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model. (arXiv:2108.07467v2 [cs.SD] UPDATED)
    Abdominal auscultation is a convenient, safe and inexpensive method to assess bowel conditions, which is essential in neonatal care. It helps early detection of neonatal bowel dysfunctions and allows timely intervention. This paper presents a neonatal bowel sound detection method to assist the auscultation. Specifically, a Convolutional Neural Network (CNN) is proposed to classify peristalsis and non-peristalsis sounds. The classification is then optimized using a Laplace Hidden Semi-Markov Model (HSMM). The proposed method is validated on abdominal sounds from 49 newborn infants admitted to our tertiary Neonatal Intensive Care Unit (NICU). The results show that the method can effectively detect bowel sounds with accuracy and area under curve (AUC) score being 89.81% and 83.96% respectively, outperforming 13 baseline methods. Furthermore, the proposed Laplace HSMM refinement strategy is proven capable to enhance other bowel sound detection models. The outcomes of this work have the potential to facilitate future telehealth applications for neonatal care. The source code of our work can be found at: https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.
    Masked Siamese Networks for Label-Efficient Learning. (arXiv:2204.07141v1 [cs.LG])
    We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
    Activation Map Adaptation for Effective Knowledge Distillation. (arXiv:2010.13500v2 [cs.CV] UPDATED)
    Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
    Shedding New Light on the Language of the Dark Web. (arXiv:2204.06885v1 [cs.CL])
    The hidden nature and the limited accessibility of the Dark Web, combined with the lack of public datasets in this domain, make it difficult to study its inherent characteristics such as linguistic properties. Previous works on text classification of Dark Web domain have suggested that the use of deep neural models may be ineffective, potentially due to the linguistic differences between the Dark and Surface Webs. However, not much work has been done to uncover the linguistic characteristics of the Dark Web. This paper introduces CoDA, a publicly available Dark Web dataset consisting of 10000 web documents tailored towards text-based Dark Web analysis. By leveraging CoDA, we conduct a thorough linguistic analysis of the Dark Web and examine the textual differences between the Dark Web and the Surface Web. We also assess the performance of various methods of Dark Web page classification. Finally, we compare CoDA with an existing public Dark Web dataset and evaluate their suitability for various use cases.
    Estimating Structural Disparities for Face Models. (arXiv:2204.06562v1 [cs.CV])
    In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations (groups) of datapoints. Thus, the inputs to disparity quantification consist of a model's predictions $\hat{y}$, the ground-truth labels for the predictions $y$, and group labels $g$ for the data points. Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated. In many real world scenarios however, group labels ($g$) may not be available at scale during training and validation time, or collecting them might not be feasible or desirable as they could often be sensitive information. As a result, evaluating disparity metrics across categorical groups would not be feasible. On the other hand, in many scenarios noisy groupings may be obtainable using some form of a proxy, which would allow measuring disparity metrics across sub-populations. Here we explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation. Our experiments indicate that embeddings resulting from an off-the-shelf face recognition model, could meaningfully serve as a proxy for such estimation.
    ExPLoit: Extracting Private Labels in Split Learning. (arXiv:2112.01299v2 [cs.CR] UPDATED)
    Split learning is a popular technique used for vertical federated learning (VFL), where the goal is to jointly train a model on the private input and label data held by two parties. This technique uses a split-model, trained end-to-end, by exchanging the intermediate representations (IR) of the inputs and gradients of the IR between the two parties. We propose ExPLoit - a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning. ExPLoit frames the attack as a supervised learning problem by using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models. Our evaluations show that ExPLoit can uncover the private labels with near-perfect accuracy of up to 99.96%. Our findings underscore the need for better training techniques for VFL.
    A Study of Causal Confusion in Preference-Based Reward Learning. (arXiv:2204.06601v1 [cs.LG])
    Learning robot policies via preference-based reward learning is an increasingly popular method for customizing robot behavior. However, in recent years, there has been a growing body of anecdotal evidence that learning reward functions from preferences is prone to spurious correlations and reward gaming or hacking behaviors. While there is much anecdotal, empirical, and theoretical analysis of causal confusion and reward gaming behaviors both in reinforcement learning and imitation learning approaches that directly map from states to actions, we provide the first systematic study of causal confusion in the context of learning reward functions from preferences. To facilitate this study, we identify a set of three preference learning benchmark domains where we observe causal confusion when learning from offline datasets of pairwise trajectory preferences: a simple reacher domain, an assistive feeding domain, and an itch-scratching domain. To gain insight into this observed causal confusion, we present a sensitivity analysis that explores the effect of different factors--including the type of training data, reward model capacity, and feature dimensionality--on the robustness of rewards learned from preferences. We find evidence that learning rewards from pairwise trajectory preferences is highly sensitive and non-robust to spurious features and increasing model capacity, but not as sensitive to the type of training data. Videos, code, and supplemental results are available at https://sites.google.com/view/causal-reward-confusion.
    deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks. (arXiv:2204.06815v1 [cs.LG])
    A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.
    OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting. (arXiv:2104.07963v3 [eess.SP] UPDATED)
    Many applications require accurate indoor localization. Fingerprint-based localization methods propose a solution to this problem, but rely on a radio map that is effort-intensive to acquire. We automate the radio map acquisition phase using a software-defined radio (SDR) and a wheeled robot. Furthermore, we open-source a radio map acquired with our automated tool for a 3GPP Long-Term Evolution (LTE) wireless link. To the best of our knowledge, this is the first publicly available radio map containing channel state information (CSI). Finally, we describe first localization experiments on this radio map using a convolutional neural network to regress for location coordinates.
    Tight Bounds for Quantum State Certification with Incoherent Measurements. (arXiv:2204.07155v1 [quant-ph])
    We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma \in \mathbb{C}^{d \times d}$, $n$ copies of a mixed state $\rho \in \mathbb{C}^{d \times d}$, and $\varepsilon > 0$, and we are asked to determine whether $\rho = \sigma$ or whether $\| \rho - \sigma \|_1 > \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d} I_d$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i.e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2} / \varepsilon^2)$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega (d^{4/3} / \varepsilon^2)$ [Bubeck-Chen-Li '20], and it has been an outstanding open problem to close this polynomial gap. In this work, 1) we settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega (d^{3/2} / \varepsilon^2)$ copies are necessary, and 2) we show the instance-optimal bounds for state certification to general $\sigma$ first derived by [Chen-Li-O'Donnell '21] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding certain matrix martingales, which we believe may be of independent interest.
    CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations. (arXiv:2204.07142v1 [cs.CL])
    Supervised learning has traditionally focused on inductive learning by observing labeled examples of a task. In contrast, humans have the ability to learn new concepts from language. Here, we explore training zero-shot classifiers for structured data purely from language. For this, we introduce CLUES, a benchmark for Classifier Learning Using natural language ExplanationS, consisting of a range of classification tasks over structured data along with natural language supervision in the form of explanations. CLUES consists of 36 real-world and 144 synthetic classification tasks. It contains crowdsourced explanations describing real-world tasks from multiple teachers and programmatically generated explanations for the synthetic tasks. To model the influence of explanations in classifying an example, we develop ExEnt, an entailment-based model that learns classifiers using explanations. ExEnt generalizes up to 18% better (relative) on novel tasks than a baseline that does not use explanations. We delineate key challenges for automated learning from explanations, addressing which can lead to progress on CLUES in the future. Code and datasets are available at: https://clues-benchmark.github.io.
    Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. (arXiv:2109.07259v2 [nlin.AO] UPDATED)
    Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and decision-making either lacks a systematic way to describe this source of uncertainty or puts the focus on obtaining optimal policies using complex models of the world that would impose an unrealistically high cognitive demand on real agents. In this work we aim to efficiently describe the emergent behavior of biologically plausible and parsimonious learning agents faced with partially observable worlds. Therefore we derive and present deterministic reinforcement learning dynamics where the agents observe the true state of the environment only partially. We showcase the broad applicability of our dynamics across different classes of partially observable agent-environment systems. We find that partial observability creates unintuitive benefits in a number of specific contexts, pointing the way to further research on a general understanding of such effects. For instance, partially observant agents can learn better outcomes faster, in a more stable way and even overcome social dilemmas. Furthermore, our method allows the application of dynamical systems theory to partially observable multiagent leaning. In this regard we find the emergence of catastrophic limit cycles, a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions, all caused by partial observability. Therefore, the presented dynamics have the potential to become a formal, yet practical, lightweight and robust tool for researchers in biology, social science and machine learning to systematically investigate the effects of interacting partially observant agents.
    Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission. (arXiv:2204.06766v1 [cs.LG])
    Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.
    EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting. (arXiv:2204.07066v1 [cs.NE])
    In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with the same initial weights are spawned. Each child undergoes a training step and adjusts their weights on the same data. Due to stochastic back-propagation, the set of children has a variety of weights with different levels of performance. The weights that best minimize the reconstruction loss with a given signal dictionary are passed to the next generation. The predictions from the best-performing weights of the first and last generation are compared. We found improvements while comparing the weights of these two generations. However, due to several confounding parameters and hyperparameter limitations, some of the weights had negligible improvements. To the best of our knowledge, this is the first attempt to use sparse coding in this way to optimize time series forecasting model weights, such as those of an LSTM network.
    A deep learning algorithm for reducing false positives in screening mammography. (arXiv:2204.06671v1 [cs.CV])
    Screening mammography improves breast cancer outcomes by enabling early detection and treatment. However, false positive callbacks for additional imaging from screening exams cause unnecessary procedures, patient anxiety, and financial burden. This work demonstrates an AI algorithm that reduces false positives by identifying mammograms not suspicious for breast cancer. We trained the algorithm to determine the absence of cancer using 123,248 2D digital mammograms (6,161 cancers) and performed a retrospective study on 14,831 screening exams (1,026 cancers) from 15 US and 3 UK sites. Retrospective evaluation of the algorithm on the largest of the US sites (11,592 mammograms, 101 cancers) a) left the cancer detection rate unaffected (p=0.02, non-inferiority margin 0.25 cancers per 1000 exams), b) reduced callbacks for diagnostic exams by 31.1% compared to standard clinical readings, c) reduced benign needle biopsies by 7.4%, and d) reduced screening exams requiring radiologist interpretation by 41.6% in the simulated clinical workflow. This work lays the foundation for semi-autonomous breast cancer screening systems that could benefit patients and healthcare systems by reducing false positives, unnecessary procedures, patient anxiety, and expenses.
    SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. (arXiv:2204.06699v1 [cs.LG])
    Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.
    Twitter User Representation Using Weakly Supervised Graph Embedding. (arXiv:2108.08988v3 [cs.CL] UPDATED)
    Social media platforms provide convenient means for users to participate in multiple online activities on various contents and create fast widespread interactions. However, this rapidly growing access has also increased the diverse information, and characterizing user types to understand people's lifestyle decisions shared in social media is challenging. In this paper, we propose a weakly supervised graph embedding based framework for understanding user types. We evaluate the user embedding learned using weak supervision over well-being related tweets from Twitter, focusing on 'Yoga', 'Keto diet'. Experiments on real-world datasets demonstrate that the proposed framework outperforms the baselines for detecting user types. Finally, we illustrate data analysis on different types of users (e.g., practitioner vs. promotional) from our dataset. While we focus on lifestyle-related tweets (i.e., yoga, keto), our method for constructing user representation readily generalizes to other domains.
    ICSML: Industrial Control Systems Machine Learning Inference Framework natively executing on IEC 61131-3 compliant devices. (arXiv:2202.10075v2 [cs.LG] UPDATED)
    Industrial Control Systems (ICS) have played a catalytic role in enabling the 4th Industrial Revolution. ICS devices like Programmable Logic Controllers (PLCs), automate, monitor, and control critical processes in industrial, energy, and commercial environments. The convergence of traditional Operational Technology (OT) with Information Technology (IT) has opened a new and unique threat landscape. This has inspired defense research that focuses heavily on Machine Learning (ML) based anomaly detection methods that run on external IT hardware, which means an increase in costs and the further expansion of the threat landscape. To remove this requirement, we introduce the ICS machine learning inference framework (ICSML) which enables the execution of ML model inference natively on the PLC. ICSML is implemented in IEC 61131-3 code and provides several optimizations to bypass the limitations imposed by the domain-specific languages. Therefore, it works \emph{on every PLC without the need for vendor support}. ICSML provides a complete set of components for the creation of full ML models similarly to established ML frameworks. We run a series of benchmarks studying memory and performance and compare our solution to the TFLite inference framework. At the same time, we develop domain-specific model optimizations to improve the efficiency of ICSML. To demonstrate the abilities of ICSML, we evaluate a case study of a real defense for process-aware attacks targeting a desalination plant.
    Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach. (arXiv:2204.06910v1 [cs.NI])
    In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).
    Surface Similarity Parameter: A New Machine Learning Loss Metric for Oscillatory Spatio-Temporal Data. (arXiv:2204.06843v1 [cs.LG])
    Supervised machine learning approaches require the formulation of a loss functional to be minimized in the training phase. Sequential data are ubiquitous across many fields of research, and are often treated with Euclidean distance-based loss functions that were designed for tabular data. For smooth oscillatory data, those conventional approaches lack the ability to penalize amplitude, frequency and phase prediction errors at the same time, and tend to be biased towards amplitude errors. We introduce the surface similarity parameter (SSP) as a novel loss function that is especially useful for training machine learning models on smooth oscillatory sequences. Our extensive experiments on chaotic spatio-temporal dynamical systems indicate that the SSP is beneficial for shaping gradients, thereby accelerating the training process, reducing the final prediction error, and implementing a stronger regularization effect compared to using classical loss functions. The results indicate the potential of the novel loss metric particularly for highly complex and chaotic data, such as data stemming from the nonlinear two-dimensional Kuramoto-Sivashinsky equation and the linear propagation of dispersive surface gravity waves in fluids.
    Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation. (arXiv:2204.07028v1 [cs.LG])
    Federated learning (FL) is a distributed machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. User-constrained communication bandwidth and the requirement for personalized models pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable. Proxy-data-free FD approaches eliminate the need for additional public data beyond clients' private data, but suffer from remarkable discrepancy among local knowledge due to model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC method outperforms the state-of-the-art in 93.33% of comparisons, and achieves faster convergence without increasing communication overhead.
    Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. (arXiv:2204.05030v2 [cs.AI] UPDATED)
    This paper contributes with a pragmatic evaluation framework for explainable Machine Learning (ML) models for clinical decision support. The study revealed a more nuanced role for ML explanation models, when these are pragmatically embedded in the clinical context. Despite the general positive attitude of healthcare professionals (HCPs) towards explanations as a safety and trust mechanism, for a significant set of participants there were negative effects associated with confirmation bias, accentuating model over-reliance and increased effort to interact with the model. Also, contradicting one of its main intended functions, standard explanatory models showed limited ability to support a critical understanding of the limitations of the model. However, we found new significant positive effects which repositions the role of explanations within a clinical context: these include reduction of automation bias, addressing ambiguous clinical cases (cases where HCPs were not certain about their decision) and support of less experienced HCPs in the acquisition of new domain knowledge.
    The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark. (arXiv:2204.06972v1 [cs.CV])
    We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE by 8.2% and the MAE by 7.7%. The dataset is available at: https://humaticslab.github.io/forecasting/visuelle.
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.
    HCR-Net: A deep learning based script independent handwritten character recognition network. (arXiv:2108.06663v2 [cs.CV] UPDATED)
    Despite being studied extensively for a few decades, handwritten character recognition (HCR) is considered a challenging learning problem in pattern recognition and there is very limited research on script independent models. This is mainly because of diversity of scripts, focus of the conventional research on handcrafted feature extraction techniques, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning but it has been studied for specific scripts only. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. HCR-Net is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained network are utilized. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalizations, and can achieve up to 99\% results of its final accuracy in just first epoch. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at https://github.com/jmdvinodjmd/HCR-Net.
    A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming. (arXiv:2110.03894v2 [eess.AS] UPDATED)
    In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.
    Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses. (arXiv:2106.08746v3 [cs.LG] UPDATED)
    Recent work has shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial perturbations. Adversaries can mislead policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that using the Universal Adversarial Perturbation (UAP) method to compute perturbations, independent of the individual inputs to which they are applied to, can fool DRL policies effectively and in real time. We describe three such attack variants. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster compared to the response time (0.6ms on average) of different DRL policies, and considerably faster than prior attacks using adversarial perturbations (1.8ms on average). We also show that our attack technique is efficient, incurring an online computational cost of 0.027ms on average. Using two further tasks involving robotic movement, we confirm that our results generalize to more complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We propose an effective technique that detects all known adversarial perturbations against DRL policies, including all the universal perturbations presented in this paper.
    Extracting Finite Automata from RNNs Using State Merging. (arXiv:2201.12451v3 [cs.LG] UPDATED)
    One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our method on the Tomita languages benchmark, where we find that it is able to extract faithful automata from RNNs trained on all languages in the benchmark. We find that extraction performance is aided by the number of data provided during the extraction process, as well as, curiously, whether the RNN model is trained for additional epochs after perfectly learning its target language. We use our method to analyze this phenomenon, finding that training beyond convergence is useful because it leads to compression of the internal state space of the RNN. This finding demonstrates how our method can be used for interpretability and analysis of trained RNN models.
    SkillNet: A Sparsely Activated Model for General-Purpose Natural Language Understanding. (arXiv:2203.03312v2 [cs.CL] UPDATED)
    Prevailing deep models are single-purpose and overspecialize at individual tasks. However, when being extended to new tasks, they typically forget previously learned skills and learn from scratch. We address this issue by introducing SkillNet, a general-purpose model that stitches together existing skills to learn new tasks more effectively. The key feature of our approach is that it is sparsely activated guided by predefined skills. Different from traditional dense models that always activate all the model parameters, SkillNet only activates parts of the model parameters whose skills are relevant to the target task. When learning for a new task, our approach precisely activates required skills and also provides an option to add new skills. We evaluate on natural language understandings tasks and have the following findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.
    Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model. (arXiv:2202.06257v2 [cs.LG] UPDATED)
    Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.
    DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model. (arXiv:2112.14798v3 [physics.comp-ph] UPDATED)
    A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newtonian hydrodynamic model, has been proposed and has shown some success in systematically passing the micro-scale structural mechanics information to the macro-scale hydrodynamics for suspensions with simple polymer conformation and bond potential. The model retains a multi-scaled nature by mapping the polymer configurations into a set of symmetry-preserving macro-scale features. The extended constitutive laws for these macro-scale features can be directly learned from the kinetics of their micro-scale counterparts. In this paper, we develop DeePN$^2$ using more complex micro-structural models. We show that DeePN$^2$ can faithfully capture the broadly overlooked viscoelastic differences arising from the specific molecular structural mechanics without human intervention.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v1 [cs.LG])
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    LEFM-Nets: Learnable Explicit Feature Map Deep Networks for Segmentation of Histopathological Images of Frozen Sections. (arXiv:2204.06955v1 [eess.IV])
    Accurate segmentation of medical images is essential for diagnosis and treatment of diseases. These problems are solved by highly complex models, such as deep networks (DN), requiring a large amount of labeled data for training. Thereby, many DNs possess task- or imaging modality specific architectures with a decision-making process that is often hard to explain and interpret. Here, we propose a framework that embeds existing DNs into a low-dimensional subspace induced by the learnable explicit feature map (LEFM) layer. Compared to the existing DN, the framework adds one hyperparameter and only modestly increase the number of learnable parameters. The method is aimed at, but not limited to, segmentation of low-dimensional medical images, such as color histopathological images of stained frozen sections. Since features in the LEFM layer are polynomial functions of the original features, proposed LEFM-Nets contribute to the interpretability of network decisions. In this work, we combined LEFM with the known networks: DeepLabv3+, UNet, UNet++ and MA-net. New LEFM-Nets are applied to the segmentation of adenocarcinoma of a colon in a liver from images of hematoxylin and eosin (H&E) stained frozen sections. LEFM-Nets are also tested on nuclei segmentation from images of H&E stained frozen sections of ten human organs. On the first problem, LEFM-Nets achieved statistically significant performance improvement in terms of micro balanced accuracy and $F_1$ score than original networks. LEFM-Nets achieved only better performance in comparison with the original networks on the second problem. The source code is available at https://github.com/dsitnik/lefm.
    A Level Set Theory for Neural Implicit Evolution under Explicit Flows. (arXiv:2204.07159v1 [cs.CV])
    Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy-minimization problems that induce an instantaneous flow field on the explicit surface. Our method uses the flow field to deform parametric implicit surfaces by extending the classical theory of level sets. We also derive a consolidated view for existing methods on differentiable surface extraction and rendering, by formalizing connections to the level-set theory. We show that these methods drift from the theory and that our approach exhibits improvements for applications like surface smoothing, mean-curvature flow, inverse rendering and user-defined editing on implicit geometry.
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals. (arXiv:2204.06644v1 [cs.LG])
    We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.
    Performance Assessment of different Machine Learning Algorithm for Life-Time Prediction of Solder Joints based on Synthetic Data. (arXiv:2204.06627v1 [cs.LG])
    This paper proposes a computationally efficient methodology to predict the damage progression in solder contacts of electronic components using temperature-time curves. For this purpose, two machine learning algorithms, a Multilayer Perceptron and a Long Short-Term Memory network, are trained and compared with respect to their prediction accuracy and the required amount of training data. The training is performed using synthetic, normally distributed data that is realistic for automotive applications. A finite element model of a simple bipolar chip resistor in surface mount technology configuration is used to numerically compute the synthetic data. As a result, both machine learning algorithms show a relevant accuracy for the prediction of accumulated creep strains. With a training data length of 350 hours (12.5% of the available training data), both models show a constantly good fitting performance of $R^2$ of 0.72 for the Multilayer Perceptron and $R^2$ of 0.87 for the Long Short-Term Memory network. The prediction errors of the accumulated creep strains are less than 10% with an amount of 350 hours training data and decreases to less than 5 % when using further data. Therefore, both approaches are promising for the lifetime prediction directly on the electronic device.
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.
    Question rewriting? Assessing its importance for conversational question answering. (arXiv:2201.09146v2 [cs.CL] UPDATED)
    In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data. (arXiv:2204.06701v1 [cs.LG])
    Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2 time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.
    BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. (arXiv:2201.02693v2 [cs.LG] UPDATED)
    Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. We publish the code repository for reproducibility of the results in this study.
    A Natural Language Processing Approach for Instruction Set Architecture Identification. (arXiv:2204.06624v1 [cs.CR])
    Binary analysis of software is a critical step in cyber forensics applications such as program vulnerability assessment and malware detection. This involves interpreting instructions executed by software and often necessitates converting the software's binary file data to assembly language. The conversion process requires information about the binary file's target instruction set architecture (ISA). However, ISA information might not be included in binary files due to compilation errors, partial downloads, or adversarial corruption of file metadata. Machine learning (ML) is a promising methodology that can be used to identify the target ISA using binary data in the object code section of binary files. In this paper we propose a binary code feature extraction model to improve the accuracy and scalability of ML-based ISA identification methods. Our feature extraction model can be used in the absence of domain knowledge about the ISAs. Specifically, we adapt models from natural language processing (NLP) to i) identify successive byte patterns commonly observed in binary codes, ii) estimate the significance of each byte pattern to a binary file, and iii) estimate the relevance of each byte pattern in distinguishing between ISAs. We introduce character-level features of encoded binaries to identify fine-grained bit patterns inherent to each ISA. We use a dataset with binaries from 12 different ISAs to evaluate our approach. Empirical evaluations show that using our byte-level features in ML-based ISA identification results in an 8% higher accuracy than the state-of-the-art features based on byte-histograms and byte pattern signatures. We observe that character-level features allow reducing the size of the feature set by up to 16x while maintaining accuracy above 97%.
    Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression. (arXiv:2204.06787v1 [cs.LG])
    Traditional one-bit compressed stochastic gradient descent can not be directly employed in multi-hop all-reduce, a widely adopted distributed training paradigm in network-intensive high-performance computing systems such as public clouds. According to our theoretical findings, due to the cascading compression, the training process has considerable deterioration on the convergence performance. To overcome this limitation, we implement a sign-bit compression-based learning synchronization framework, Marsit. It prevents cascading compression via an elaborate bit-wise operation for unbiased sign aggregation and its specific global compensation mechanism for mitigating compression deviation. The proposed framework retains the same theoretical convergence rate as non-compression mechanisms. Experimental results demonstrate that Marsit reduces up to 35% training time while preserving the same accuracy as training without compression.
    Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning. (arXiv:2204.06904v1 [quant-ph])
    Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.
    MIMO Channel Estimation using Score-Based Generative Models. (arXiv:2204.07122v1 [eess.SP])
    Channel estimation is a critical task in multiple-input multiple-output digital communications that has effects on end-to-end system performance. In this work, we introduce a novel approach for channel estimation using deep score-based generative models. These models are trained to estimate the gradient of the log-prior distribution, and can be used to iteratively refine estimates, given observed measurements of a signal. We introduce a framework for training score-based generative models for wireless channels, as well as performing channel estimation using posterior sampling at test time. We derive theoretical robustness guarantees of channel estimation with posterior sampling in single-input single-output scenarios, and show that the observations regarding estimation performance are verified experimentally in MIMO channels. Our results in simulated clustered delay line channels show competitive in-distribution performance without error floors in the high signal-to-noise ratio regime, and robust out-of-distribution performance, outperforming competing deep learning methods by up to 5 dB in end-to-end communication performance, while the complexity analysis reveals how model architecture can efficiently trade performance for estimation latency.
    data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. (arXiv:2202.03555v2 [cs.LG] UPDATED)
    While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
    From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks. (arXiv:2204.07018v1 [cs.SD])
    This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack algorithms on the balance of average budgets allocated by the adversary and the attack cost. Moreover, our experimental results have shown that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary than other 2D representations. We also report some results on different convolutional neural network architectures such as ResNet-34, ResNet-56, AlexNet, and GoogLeNet, SB-CNN, and LSTM-based.  ( 2 min )
    Learning Convolutional Neural Networks in Frequency Domain. (arXiv:2204.06718v1 [cs.CV])
    Convolutional neural network (CNN) achieves impressive success in the field of computer vision during the past few decades. As the core of CNNs, image convolution operation helps CNNs to achieve good performance on image-related tasks. However, image convolution is hard to be implemented and parallelized. In this paper, we propose a novel neural network model, namely CEMNet, that can be trained in frequency domain. The most important motivation of this research is that we can use the very simple element-wise multiplication operation to replace the image convolution in frequency domain based on Cross-Correlation Theorem. We further introduce Weight Fixation Mechanism to alleviate over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU and Dropout in frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by DFT, we design two branch network structure for CEMNet. Experimental results imply that CEMNet works well in frequency domain, and achieve good performance on MNIST and CIFAR-10 databases. To our knowledge, CEMNet is the first model trained in Fourier Domain that achieves more than 70\% validation accuracy on CIFAR-10 database.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Joint Coreset Construction and Quantization for Distributed Machine Learning. (arXiv:2204.06652v1 [cs.LG])
    Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between ML error bounds and costs, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error bounds caused by a combination of coreset construction and quantization. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budget to the nodes while minimizing the overall ML error. Through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness and efficiency of our proposed algorithms for a variety of ML tasks. In particular, our algorithms have achieved more than 90% data reduction with less than 10% degradation in ML performance in most cases.  ( 2 min )
    Control-oriented meta-learning. (arXiv:2204.06716v1 [cs.RO])
    Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully-actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.  ( 2 min )
    Leveraging convergence behavior to balance conflicting tasks in multi-task learning. (arXiv:2204.06698v1 [cs.LG])
    Multi-Task Learning is a learning paradigm that uses correlated tasks to improve performance generalization. A common way to learn multiple tasks is through the hard parameter sharing approach, in which a single architecture is used to share the same subset of parameters, creating an inductive bias between them during the training process. Due to its simplicity, potential to improve generalization, and reduce computational cost, it has gained the attention of the scientific and industrial communities. However, tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined to allow simultaneous learning. To address this problem, we use the idea of multi-objective optimization to propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation. The result of this method is to give more attention to the tasks that are diverging or that are not being benefited during the last iterations, allowing to ensure that the simultaneous learning is heading to the performance maximization of all tasks. As a result, we empirically show that the proposed method outperforms the state-of-art approaches on learning conflicting tasks. Unlike the adopted baselines, our method ensures that all tasks reach good generalization performances.  ( 2 min )
    Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. (arXiv:2204.06684v1 [physics.comp-ph])
    Deep neural operators can learn operators mapping between infinite-dimensional function spaces via deep neural networks and have become an emerging paradigm of scientific machine learning. However, training neural operators usually requires a large amount of high-fidelity data, which is often difficult to obtain in real engineering problems. Here, we address this challenge by using multifidelity learning, i.e., learning from multifidelity datasets. We develop a multifidelity neural operator based on a deep operator network (DeepONet). A multifidelity DeepONet includes two standard DeepONets coupled by residual learning and input augmentation. Multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data. We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport. By combining a trained multifidelity DeepONet with genetic algorithm or topology optimization, we demonstrate a fast solver for the inverse design of BTE problems.  ( 2 min )
    Leveraging Natural Learning Processing to Uncover Themes in Clinical Notes of Patients Admitted for Heart Failure. (arXiv:2204.07074v1 [cs.LG])
    Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.  ( 2 min )
    The Vision of Self-Evolving Computing Systems. (arXiv:2204.06825v1 [cs.SE])
    Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing system detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.  ( 2 min )
    Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. (arXiv:2204.07049v1 [cs.RO])
    In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.  ( 2 min )
    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data. (arXiv:2101.11442v2 [physics.med-ph] UPDATED)
    Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cram\'er-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.  ( 2 min )
    Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity. (arXiv:2204.06618v1 [cs.CC])
    This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.  ( 2 min )
    Network state Estimation using Raw Video Analysis: vQoS-GAN based non-intrusive Deep Learning Approach. (arXiv:2204.07062v1 [cs.MM])
    Content based providers transmits real time complex signal such as video data from one region to another. During this transmission process, the signals usually end up distorted or degraded where the actual information present in the video is lost. This normally happens in the streaming video services applications. Hence there is a need to know the level of degradation that happened in the receiver side. This video degradation can be estimated by network state parameters like data rate and packet loss values. Our proposed solution vQoS GAN (video Quality of Service Generative Adversarial Network) can estimate the network state parameters from the degraded received video data using a deep learning approach of semi supervised generative adversarial network algorithm. A robust and unique design of deep learning network model has been trained with the video data along with data rate and packet loss class labels and achieves over 95 percent of training accuracy. The proposed semi supervised generative adversarial network can additionally reconstruct the degraded video data to its original form for a better end user experience.  ( 2 min )
    Clifford Circuits can be Properly PAC Learned if and only if $\textsf{RP}=\textsf{NP}$. (arXiv:2204.06638v1 [quant-ph])
    Given a dataset of input states, measurements, and probabilities, is it possible to efficiently predict the measurement probabilities associated with a quantum circuit? Recent work of Caro and Datta (2020) studied the problem of PAC learning quantum circuits in an information theoretic sense, leaving open questions of computational efficiency. In particular, one candidate class of circuits for which an efficient learner might have been possible was that of Clifford circuits, since the corresponding set of states generated by such circuits, called stabilizer states, are known to be efficiently PAC learnable (Rocchetto 2018). Here we provide a negative result, showing that proper learning of CNOT circuits is hard for classical learners unless $\textsf{RP} = \textsf{NP}$. As the classical analogue and subset of Clifford circuits, this naturally leads to a hardness result for Clifford circuits as well. Additionally, we show that if $\textsf{RP} = \textsf{NP}$ then there would exist efficient proper learning algorithms for CNOT and Clifford circuits. By similar arguments, we also find that an efficient proper quantum learner for such circuits exists if and only if $\textsf{NP} \subseteq \textsf{RQP}$.  ( 2 min )
    EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. (arXiv:2204.06947v1 [cs.LG])
    In recent years, neural networks and especially deep architectures have received substantial attention for EEG signal analysis in the field of brain-computer interfaces (BCIs). In this ongoing research area, the end-to-end models are more favoured than traditional approaches requiring signal transformation pre-classification. They can eliminate the need for prior information from experts and the extraction of handcrafted features. However, although several deep learning algorithms have been already proposed in the literature, achieving high accuracies for classifying motor movements or mental tasks, they often face a lack of interpretability and therefore are not quite favoured by the neuroscience community. The reasons behind this issue can be the high number of parameters and the sensitivity of deep neural networks to capture tiny yet unrelated discriminative features. We propose an end-to-end deep learning architecture called EEG-ITNet and a more comprehensible method to visualise the network learned patterns. Using inception modules and causal convolutions with dilation, our model can extract rich spectral, spatial, and temporal information from multi-channel EEG signals with less complexity (in terms of the number of trainable parameters) than other existing end-to-end architectures, such as EEG-Inception and EEG-TCNet. By an exhaustive evaluation on dataset 2a from BCI competition IV and OpenBMI motor imagery dataset, EEG-ITNet shows up to 5.9\% improvement in the classification accuracy in different scenarios with statistical significance compared to its competitors. We also comprehensively explain and support the validity of network illustration from a neuroscientific perspective. We have also made our code open at https://github.com/AbbasSalami/EEG-ITNet  ( 2 min )
    Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis. (arXiv:2204.06929v1 [eess.IV])
    Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.  ( 2 min )
    Modularity benefits reinforcement learning agents with competing homeostatic drives. (arXiv:2204.06608v1 [cs.LG])
    The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.  ( 2 min )
    Learning Invariances with Generalised Input-Convex Neural Networks. (arXiv:2204.07009v1 [cs.LG])
    Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry.  ( 2 min )
    Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems. (arXiv:2204.07135v1 [cs.LG])
    Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.
    SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots. (arXiv:2011.06252v2 [cs.CV] UPDATED)
    This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and top-down learning within two separate branches of the network while sharing the same encoding layers. We design dedicated spatial attention modules (SAMs) along these learning pathways to exploit the coarse-level and fine-level semantic features for SOD at four stages of abstractions. The bottom-up branch performs a rough yet reasonably accurate saliency estimation at a fast rate, whereas the deeper top-down branch incorporates a residual refinement module (RRM) that provides fine-grained localization of the salient objects. Extensive performance evaluation of SVAM-Net on benchmark datasets clearly demonstrates its effectiveness for underwater SOD. We also validate its generalization performance by several ocean trials' data that include test images of diverse underwater scenes and waterbodies, and also images with unseen natural objects. Moreover, we analyze its computational feasibility for robotic deployments and demonstrate its utility in several important use cases of visual attention modeling.
    Ensemble learning using individual neonatal data for seizure detection. (arXiv:2204.07043v1 [eess.SP])
    Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The approach allows different detector architectures amongst the institutions. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best overall performance, it was only marginally outperformed by the Dawid-Skene method when local detectors approach performance of a single detector trained on all available data.
    Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks. (arXiv:2008.09777v4 [cs.LG] UPDATED)
    The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tabular NAS benchmarks has been a focus on extremely small architectural search spaces since their construction relies on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger spaces. To overcome this fundamental limitation, we propose a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. We exemplify this approach by creating surrogate NAS benchmarks on the existing tabular NAS-Bench-101 and on two widely used NAS search spaces with up to $10^{21}$ architectures ($10^{13}$ times larger than any previous tabular NAS benchmark). We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight. We open-source all our code and believe that surrogate NAS benchmarks are an indispensable tool to extend scientifically sound work on NAS to large and exciting search spaces.
  • Open

    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    Sparse Interaction Neighborhood Selection for Markov Random Fields via Reversible Jump and Pseudoposteriors. (arXiv:2204.05933v2 [stat.CO] UPDATED)
    We consider the problem of estimating the interacting neighborhood of a Markov Random Field model with finite support and homogeneous pairwise interactions based on relative positions of a two-dimensional lattice. Using a Bayesian framework, we propose a Reversible Jump Monte Carlo Markov Chain algorithm that jumps across subsets of a maximal range neighborhood, allowing us to perform model selection based on a marginal pseudoposterior distribution of models.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.  ( 2 min )
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.  ( 2 min )
    Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation. (arXiv:2203.15619v3 [cs.CV] UPDATED)
    In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.  ( 2 min )
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.  ( 2 min )
    Observable adjustments in single-index models for regularized M-estimators. (arXiv:2204.06990v1 [math.ST])
    We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.  ( 2 min )
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.  ( 2 min )
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.  ( 2 min )
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.  ( 2 min )
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.  ( 2 min )
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.  ( 2 min )
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.  ( 2 min )
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.  ( 2 min )
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.  ( 2 min )
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects. (arXiv:1910.01799v3 [stat.ME] UPDATED)
    We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.  ( 2 min )
    Using Machine Learning for Particle Identification in ALICE. (arXiv:2204.06900v1 [nucl-ex])
    Particle identification (PID) is one of the main strengths of the ALICE experiment at the LHC. It is a crucial ingredient for detailed studies of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. ALICE provides PID information via various experimental techniques, allowing for the identification of particles over a broad momentum range (from around 100 MeV/$c$ to around 50 GeV/$c$). The main challenge is how to combine the information from various detectors effectively. Therefore, PID represents a model classification problem, which can be addressed using Machine Learning (ML) solutions. Moreover, the complexity of the detector and richness of the detection techniques make PID an interesting area of research also for the computer science community. In this work, we show the current status of the ML approach to PID in ALICE. We discuss the preliminary work with the Random Forest approach for the LHC Run 2 and a more advanced solution based on Domain Adaptation Neural Networks, including a proposal for its future implementation within the ALICE computing software for the upcoming LHC Run 3.  ( 2 min )

  • Open

    [D] What is the difference between channel-wise and self attention in this case?
    Example: I fed 32 feature maps of dimension 6x6x32 into a Squeeze and Excitation layer, which assigns a weight to each of my channel through a channel-wise attention mechanism. What is the difference between passing these 32 feature maps into a Hybrid Transformer Encoder with patch of dimension 6x6? (So 1 patch for each channel) As I understand it, channel attention says "which channel is important for the final prediction". While transformer (with self attention) tells us "where to focus our attention in a given context". Isn't that the same if the patches are the channels? Basically it tells us on which patch to focus, and if patch=channel then squeeze excitation = self attention ? submitted by /u/Rogitus [link] [comments]  ( 1 min )
    [P] Bounding.ai Launches New Marketplace for AI Labeled Data
    In a new announcement, Bounding.ai launched its marketplace for computer vision and AI teams to access training data easily. The platform is designed to empower individuals and small companies around the world to create and sell datasets that will be instantly accessible by any team in need of labeled data. Bounding.ai Launches New Marketplace for AI Labeled Data & $5,000 Prize submitted by /u/Freyr_AI [link] [comments]  ( 1 min )
    [D]Unsupervised classification of words/phrases?
    I have found most unsupervised text classification methods to be mostly suitable for classifying documents containing relatively large amounts of words/sentences. However, I have a dataset with entries containing only single words or phrases but not full sentences. The goal is to do unsupervised semantic classification on these words/phrases. Are there any existing algorithms for such a task? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 1 min )
    [N] Robot Arm Acts As "Hand And Eyes" of Language Model To Execute Real World Tasks With SayCan And Robotics At Google
    Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. Github: https://say-can.github.io/ Video of Robot Executing Commands: https://youtu.be/zOph99BjRqs?t=4 submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] AskScience AMA Series: We are seven leading scientists specializing in the intersection of machine learning and neuroscience. Ask Us Anything about computational neuroscience or science education!
    submitted by /u/blueneuronDOTnet [link] [comments]  ( 2 min )
    [D] How DALL-E 2 Actually Works
    Here's a video explaining the overall architecture of DALL-E 2 and how it actually works! Great overview for those who haven't had time to read the paper How does DALL-E 2 actually work? submitted by /u/SleekEagle [link] [comments]  ( 1 min )
    [N] Announcing the Learning on Graphs Conference!
    We think this new venue will be valuable for the Graph/Geometric Machine Learning community. Why? See our blogpost: https://michael-bronstein.medium.com/announcing-the-learning-on-graphs-conference-c63caed7347 The LoG Conference key facts: - Covers work broadly related to machine learning on graphs and geometry - Proceedings track published in PMLR - Also has a non-archival extended abstract track - Double blind review process on OpenReview - Top reviewers receive monetary rewards - First year: virtual December 9-12 2022, free to attend. Call for papers: https://logconference.github.io/cfp/ Stay updated via Twitter: https://twitter.com/LogConference Or LinkedIn: https://www.linkedin.com/company/log-conference Advisory board: Regina Barzilay (MIT), Xavier Bresson (NUS), Michael Bronstein (Oxford/Twitter), Stephan Günnemann (TUM), Stefanie Jegelka (MIT), Jure Leskovec (Stanford), Pietro Liò (Cambridge), Jian Tang (MILA/HEC Montreal), Jie Tang (Tsinghua), Petar Veličković (DeepMind), Soledad Villar (JHU), Marinka Zitnik (Harvard). Organizers: Yuanqi Du (DP Technology), Hannes Stärk (MIT), Derek Lim (MIT), Chaitanya Joshi (Cambridge), Andreea-Ioana Deac (Mila), Iulia Duta (Cambridge), Joshua Robinson (MIT). submitted by /u/Hannes-Stark [link] [comments]  ( 1 min )
    [P] Using Language Models to (probably) Read Faster
    I explored using language models to highlight more salient parts of a PDF file which hopefully help users to read faster. The main idea is to highlight only the characters which language model failed to predict. I have implemented this as an experimental feature in sioyek PDF reader. Here is a blog post explaining this in full detail: https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html submitted by /u/highergraphic [link] [comments]  ( 2 min )
    [P] GANs and generating visually indeterminate images by error
    (Please correct me if I'm using the wrong flair/on the wrong sub) I'm currently working on a project that focuses on GANs and generative art, particularly images that concern visual indeterminacy. I trying to find papers/articles that discuss the development/application of (any kind of) GAN in which along the way or as a final result, images were generated that would be considered visually indeterminate. Specifically, research in which the objective was to generate images with clear, recognizable objects/scenes. In my mind I'm looking for articles in which the GAN architecture is discussed and in which ways what parts of it could've influenced the particular aspects of the incorrectly generated image. This probably wouldn't be the focus of any research but I was wondering if anyone has ever come across a discussion section in a GAN paper or could point me towards some areas or projects where I might find something that I could connect to my project. submitted by /u/mel4ncholi4 [link] [comments]  ( 1 min )
    [D] Ensemble methods (e.g. hard voting) in machine learning
    When should we consider ensemble methods in machine learning? Is there any statistical criteria using which we can decide, if doing ensemble may help? submitted by /u/flaubart9 [link] [comments]  ( 2 min )
    [D] How do you understand "Both FF𝐿 and FF𝑆 were 3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input."? (re-implementing https://arxiv.org/abs/2101.10587v1)
    Hi, I am re-implementing the paper "Low Resource Recognition and Linking of Biomedical Concepts from a Large Ontology" https://arxiv.org/abs/2101.10587v1 . They describe one part of their model as "3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input." For me that's not enough information to uniquely characterize the network, but maybe for someone with more experience the intended structure is obvious. The output should be a scalar, so i assume that it's something like: Dropout(0.1) -> Linear(, 1024) -> Linear(1024, 256) -> GELU -> Linear(256, 1) ? Or is the nonlinearity (GELU) normally applied after each step? submitted by /u/ldorigo [link] [comments]
    Why did SciNet not get more attention? [D]
    It seems to shatter previous benchmarks with a new, innovative architecture, yet it only has 3 citations and little to no attention from the community as far as I can see. Is it because time series forecasting is not very trendy right now or is there anything wrong with the paper? The paper in question: https://arxiv.org/pdf/2106.09305v2.pdf submitted by /u/vidul7498 [link] [comments]  ( 2 min )
    [D] Do you train and deploy models using just one framework or multiple frameworks at work?
    Hi, I'm the creator of Pinferencia. Currently I'm design new features to-do list. I want to know: Do you train and deploy models using just one framework or multiple frameworks at work? For example, use pytorch for training and deployment, or use tensorflow, pytorch for training, onnx for deployment. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [P] Extremely short and simple implementation of Denoising Diffusion Model, for educational purpose
    ​ Randomly sampled MNIST output. It's not good I know. Hi, I noticed there aren't that many simple implementation of DDPM, for example, using MNIST. I had to make a presentation for my workplace seminar, so I had to implement the simplified version of DDPM myself. The whole thing is under 200 lines of code https://github.com/cloneofsimo/minDiffusion This implementation has MANY missing details, such as Unet Models etc. I think it is worth taking a look, especially if you are interested in recent boom of diffusion models (such as Dalle 2) submitted by /u/cloneofsimo [link] [comments]  ( 1 min )
    [D] Kubernetes for ML - how are y'all doing it?
    Have been involved with Mesos since 2013, and Kubernetes almost since it's inception (and saw it win the "scheduler wars"). And now being used for pretty much _all_ container workloads, including ML training and inference. Since it was built in the image of Borg (where search indexers and map reduce jobs were preemptible, and serving search workloads had to be protected at all cost)[1], how is Kubernetes holding up for your current workflows? Are you using Kubeflow? metaflow? bespoke setup on top? [1] https://queue.acm.org/detail.cfm?id=2898444 submitted by /u/nqnielsen [link] [comments]  ( 2 min )
  • Open

    New to machine learning, want to simulate robotics in a 3d environment
    My employer makes significant use of robotic weld cells, and while working with the equipment I've noticed what seems to be room for improvement in the programming. This is purely a personal academic project, as I am quite curious on if machine learning could produce comparable or superior results to the human-made programming used at work. However, as such there will unfortunately be areas of vagueness because I need to stick to knowledge that is publicly available regarding their operations. I'm going to have to stick to more generic, publicly available reference material, and will not be able to share most, if any, of the end result. I would like to run simulations in a 3D environment, using machine learning to train a computer program to find the most efficient sequence of movements &…  ( 3 min )
    Does using a centralized critic always mean that the agents receive global observation?
    submitted by /u/No_Possibility_7588 [link] [comments]
    What algorithm would be suited for a “Just do it as good as you can” situation?
    I’m really new to RL so please bear with me if I’m making mistakes here, but I’m trying to make an environment that emulates a network of roads. The algorithm will need to generate a quick route between n destinations when n equals some number with an insane amount of permutations, like 30 for example. This is like emulating the destinations required by a mailman’s route on a map, and trying to find the fastest way to get to each one. The algorithms sequence of decisions will be choosing a node to travel to, while each node represents an street intersection or point where the street ends. By the time it’s traveled to every destination using the nodes, it’ll review the network of nodes it used and sum the distance between each one to get total distance of route. The goal is to get the total distance as small as possible. Is this realistic for a RL problem, or do I need to try to engineer some way to determine if every decision was either good or bad? Could I build a mathematical way to approximate the quickest route and then reward the RL algorithm by generating a better route than the mathematically approximated one? I could try rewarding the algorithm at each decision by whether it reduced the total distance required to any target it has yet to visit. I could try to mathematically make this more viable… what do y’all think,should I do something like that? Am I headed in the right direction? Thanks for any and all help! submitted by /u/professorDissociate [link] [comments]  ( 3 min )
    Where is env.nS for Frozen Lake in OpenAI Gym
    I am trying to run this: env4 = FrozenLakeEnv(map_name='4x4', is_slippery=False) env4.nS ​ I then get this error: 'FrozenLakeEnv' object has no attribute 'nS' ​ But I see it in the source code on line 151 and 152: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py ​ Edit: I'm trying to follow along with some tutorials online. Thank you for the help! submitted by /u/postdoc403b [link] [comments]  ( 1 min )
    Getting max/min action in DDPG and TD3
    I am using DDPG for a custom environment. My reward is positive (the sum-rate in a communication system). My problem is that I get the max or min action after a few training steps and it saturates with a non-optimized solution. How can I address this problem? I tried redesigning my reward to include positive and negative values but it didn’t work. I read that some people are using reward scaling. What is it and how would I scale it? I mean is there a specific method? I couldn’t find enough resources on that. Any help is much appreciated! submitted by /u/alicefaisal [link] [comments]  ( 1 min )
    Question about pseudocodes
    Hi I'm redoing all the RL algorithms in python, to better understanding them. I'm mostly following Sutton and Barto but the pseudo code there is often hard to follow. Do you know any other place where I can look at? submitted by /u/New_neanderthal [link] [comments]  ( 1 min )
    Industry use of reinforcement learning
    I have been studying RL now for 18 months as a goal to get a job in it. Yet when I look at jobs, I see very seldom postings about it. I am wondering why is it the case ? From my current understanding I could think of dozens of applications with huge potential gains. It feel like an untapped potential. Or am I missing something ? What do you think is the big obstacle to wider adoption to RL ? Do you think it overlaps with classical control at the moment and is not justified ? submitted by /u/Ouassimf [link] [comments]  ( 4 min )
    Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 2 min )
  • Open

    Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker
    Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech […]  ( 11 min )
    Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect
    Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for […]  ( 8 min )
  • Open

    AWS Cloud Migration: All You Need to Know
    Businesses today face myriad challenges, some of which are successfully addressed with help from cloud computing. This is where AWS cloud migration which promises to be a boon for businesses grappling with a sudden increase in traffic or for those who are looking for accelerated app deployment. It is also handy for cautious businesses that… Read More »AWS Cloud Migration: All You Need to Know The post AWS Cloud Migration: All You Need to Know appeared first on Data Science Central.  ( 3 min )
  • Open

    Web Crawling in Python
    In the old days, it was a tedious job to collect data, and sometimes very expensive. Machine learning projects cannot […] The post Web Crawling in Python appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    Kamikaze Drones in Russia’s War Against Ukraine Point to Future "Killer Robots"
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    AI News | Breakthrough AI Robot Arm Understanding From Google | OpenAI DALL-E 2 | AI Edge Computing In Space
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    My first attempt at machine learning. I made a cool chatbot 😎
    I made a self learning conversational chatbot in ReactJS. It does nothing but reply to user messages and only understands text, for now 😄 https://xalen.netlify.app What do you think? Yea or Nay? submitted by /u/GameTide [link] [comments]  ( 1 min )
    Music video about AI
    submitted by /u/starlightinspace [link] [comments]
    Computational reasoning about incomputability, infinity, truth etc (Gödel, Tarski,...)
    So I would be curious about the theoretical foundations how to make sense of higher-level abstract reasoning like reasoning about infinities, incomputability, truth (which we know cannot be defined due to Tarski) in the field of artificial intelligence. It seems due to Gödel-like constructions you are forced into inconsistent systems of reasoning when operating within a computable system. But those prove everything and "nothing", so as far as I understand it, it kind of upends the whole system of reason that the notion of artificial intelligence (and correct functioning of it) is based in. Personally due to this I don't see that the notion in the title it is a particularly coherent notion, which means there is somewhat strong limits on what (computable) AI will be able to do. But I would be curious how people that think otherwise (which seem most in the AI community?) approach this. Would you say somehow inconsistency can be avoided, or that despite inconsistency you can get reliably correct results? submitted by /u/bejaq [link] [comments]  ( 6 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Hills Have Eyes || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
    how do I fix this (I'm trying to predict sine (the blue dots are it's guesses and the white line is the "correct" answer)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
    Neuroevolution of Augmenting Topologies Course
    Hey all, There's a new course on the Neuroevolution of Augmenting Topologies (NEAT) algorithm. It's a niche algorithm, but uses some very interesting mechanisms to train/evolve simple irregular neural networks. Thought some of you may be interested. submitted by /u/Cogitarius [link] [comments]
  • Open

    Startup Transforms Meeting Notes With Time-Saving Features
    Gil Makleff and Artem Koren are developing AI for meeting transcripts, creating time-savers like shareable highlights of the text that is often TL;DR (too long; didn’t read). The Sembly founders conceived the idea after years of working in enterprise operational consulting at UMT Consulting Group, which was acquired by Ernst & Young. “We had an Read article > The post Startup Transforms Meeting Notes With Time-Saving Features appeared first on NVIDIA Blog.  ( 3 min )
    A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision
    Talk about a bright idea. A team of scientists has used GPU-accelerated deep learning to show how color can be brought to night-vision systems.  In a paper published this week in the journal PLOS One, a team of researchers at the University of California, Irvine led by Professor Pierre Baldi and Dr. Andrew Browne, describes how Read article > The post A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Data Scientists vs. BI Developer: What’s the Difference?
    Here’s the truth.  ( 1 min )
  • Open

    Learning to think critically about machine learning
    A multidisciplinary team of graduate students helps infuse ethical computing content into MIT’s largest machine learning course.  ( 7 min )

  • Open

    [D] LOOCV
    I'm working with a small dataset (~400 labeled data). I plan to use logistic regression. Does it make sense/is it necessary to have a hold-out validation set along with doing Leave-one out cross validation (LOOCV) (E.g. leave 20% out, and train model on LOOCV)? submitted by /u/yontbont1 [link] [comments]  ( 1 min )
    [D] What's the probability distribution of the Feature importances in an ensemble method?
    Assuming feature importance as defined by mean decrease in impurities. I'm curious if there are any studies about their distribution. I'm thinking about using a statistical test to check if a feature is relevant or not, all I can find is using the standard deviation as a measurement of noise. Additionally I imagine if we can give the probability of one feature being more relevant than another given their feature importances submitted by /u/FellowOfHorses [link] [comments]  ( 1 min )
    [Discussion] Collecting Feedback for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    [D] What fun things in ML would you give a presentation on?
    If you had 30 minutes to present something fun and exciting to a semi-technical audience, what would you talk about on Machine Learning that would gain interest and engagement? submitted by /u/aero_gsr [link] [comments]  ( 1 min )
    [D] Evaluation and iteration for production models - how?
    How do you evaluate and improve your models in production (particularly for complex modalities like text/vision/audio)? Good models are hard In my experience from managing our CNN-based text classification & NER model at a small media analytics startup, evaluating and improving models is a mess. Our domain is fairly niche and diverse, so getting enough training data has been challenging and I mix in custom synthetics & augmentations (which can cause weird model artifacts if you're not careful). It takes a lot of time to discover tricky failure cases by either 1) observing production traffic or 2) probing manually, and then it takes even more time to get the right data to improve model behavior. Are good models hard? What's your approach to model evaluation & targeted improvement? Are there any known best practices? I'm a bit at a loss here. As mentioned, I'm specifically interested in others who have deep models as an important part of their product or pipeline across any task or modality. More particularly: How wrong is your model? How do you test it? How would you know about errors before and after it's deployed? How much of your time do you spend on iterating on your models? For what kind of issue? Which aspects are most useful to you for improving model performance and reducing critical errors? Maybe I'll take some of the more general ideas from my work and build them out into an evaluation & iteration framework. It's currently a hybrid web of synthetic, interactive/probing and classical approaches. Or maybe there is some approach/library that makes iteration easier without me having to do anything :) submitted by /u/flotothemoon [link] [comments]  ( 2 min )
    [D] Trace norm in KFAC paper for regularization
    Hi, I doubt that the trace norm of the Kronecker product is mistaken in the KFAC paper (https://arxiv.org/abs/1503.05671). Shouldn't the division in the blue mark be replaced by multiplication? https://preview.redd.it/quoxpzubfgt81.png?width=1241&format=png&auto=webp&s=19e7b60628302f3cb37ba42944088d89d7a7bd28 submitted by /u/Cautious_Proposal132 [link] [comments]  ( 1 min )
    [D] To what extent can Rust be used for Machine Learning?
    I recently saw that some parts of HuggingFace ecosystem use Rust under the hood, and HF is a large ecosystem. I've also heard from some of my friends that they had to learn Rust as a first thing in an ML company (it's their first job so they couldn't explain to me exactly why). My questions are: What are pros and cons over Python? Are there any good frameworks in Rust for ML? Are there a decent community & documentation for Rust? Is learning it a fun experience? Is it used only for deployment? The reason I'm asking this is that I really love to learn by doing. And so, if I engaged in learning a bit of Rust for ML purposes, would I be able to create something ML-like right of the bat? It can be something as simple as MNIST classifier Take note that I don't know anything about Rust, so these questions might seem noob-like. But I believe that the answers can be of help to others as well. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 5 min )
    [D] Tips for using Ensemble Learning with a small dataset
    I have started to look into using an ensemble of relatively shallow MLPs to predict using a small dataset (~100 training samples). I was looking specifically at bagging (bootstrap aggregation) as a possibility of improving prediction accuracy. I was curious if there were any heuristics for how many models to include in a bagging ensemble? Also, more generally, am I on the correct path, or is there a better direction given my situation? A different ensemble technique, or a different path all together? Any advice would be appreciated. submitted by /u/Fritos121 [link] [comments]  ( 1 min )
    [D] What JAX NN library to use?
    I've been exploring the jax ecosystem and its many neural network libraries but I can't seem to settle on one. The main 5 which i am considering are Trax, Objax, Equinox, Flax, and Elegy, however I would like to hear which jax NN lib you use and why. submitted by /u/Southern-Trip-1102 [link] [comments]
  • Open

    Locked-image Tuning: Adding Language Understanding to Image Models
    Posted by Andreas Steiner and Basil Mustafa, Research Software Engineers at Google Research, Brain team The ability to classify images into categories has been transformed by deep learning. It has also been significantly accelerated by transfer learning, whereby models are first pre-trained on large datasets, like ImageNet, to learn visual representations that are then transferred via fine-tuning to a new task with less data (e.g., classifying animals). Previous works such as BiT and ViT employed these methods to achieve state-of-the-art performance on a wide range of classification tasks, such as the VTAB benchmark. However, fine-tuning has some downsides: though pre-training is done only once, fine-tuning is necessary on every new dataset for which task-specific data is needed. Multimo…  ( 7 min )
  • Open

    questions to ask an AI?
    i recently played a game called tacoma that had a focus on AI and in the game there was a guide for AI that showed 4 hypotheticals to ask an AI to check it's morality and it got me thinking how useful that would be for a real self-aware intelligence so i want to make a list of questions/hypotheticals to ask AGIs if you had to interview a recently created sentient AI what questions or hypotheticals would you give it to gauge it's morality, intelligence, creativity, emotion etc.? submitted by /u/neonvolta [link] [comments]  ( 1 min )
    YouTuber Meets His Creepy Robot Double and Freaks Out
    submitted by /u/estasfuera [link] [comments]
    Reference request for applications of time to ai
    Does anyone know of any AI papers, books articles etc that discuss using a sense of time to develop AI, (especially real world time)? I've come across papers that discuss how having a sense of time seems to play a role in animal cognition (e.g. temporal cognition), and I'm curious to what extent this has influenced the development of AI. Thanks in advance submitted by /u/patterntheoryacc [link] [comments]  ( 1 min )
    IBM Data Science and AI Programs on Coursera Free for 30 Days
    submitted by /u/awsconsultant [link] [comments]
    Are you aware of these AI Ethical Challenges?
    submitted by /u/JencyJane [link] [comments]
    Synthetic²: Can AI Be A Powerful Force For Creation? | SiGMA/AGS UAE 2022
    submitted by /u/thedyezwfl [link] [comments]
    Google finance chief: "We automate everything that can be automated"
    submitted by /u/much_successes [link] [comments]
    Free Webinar series | Automated CV Pipelines | Instance Classification
    Automated CV Pipelines 3rd part is open for registration. It will be covering the methods of streamlining instance classification. If you are interested to check out, here is the link to register. submitted by /u/WeekendClassic [link] [comments]
    "My A.I. writes music better than humans. World-class education in A.I. + music -> decades of work -> censored from Facebook, Twitter, soon to be downvoted or unfairly-banned from Reddit. It's making the most beautiful music I've ever heard, and society despises it."
    Thirty years it's taken me, A.I. that is not just as good as humans but better than humans at composing music: https://i.imgur.com/hReXJq1.png It passes the Turing Test, and it is also a revolution in the field of music in and of itself. In the meantime, no one has said anything nice to me in thirty years; just insults. I would feel dumb rewarding humanity with my creation; it would send the wrong message; it would affirm their bad behavior. Garbage species. Low IQ. submitted by /u/PussyFiller2022 [link] [comments]  ( 1 min )
  • Open

    PPO with one worker always picking the best action?
    If I use PPO with distributed workers, and one of the workers always picks the best action, would that skew the PPO algorithm? It might perform a tad slower, but would it factually introduce wrong math? Perhaps because the PPO optimization requires that all actions are taking proportional to their probabilities? Or would it (mathematically) not matter? submitted by /u/tmuxed [link] [comments]  ( 1 min )
    Feedback Collection for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    Is a steady linear increase in average reward during training too good to be true? Are there any common pitfalls?
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Determine Gridworld values with no probability
    I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and right are not provided or stated. In this scenario, I assume we assume the optimal policy and therefore, you would apply the Bellman equation as: V(s)=maxa(R(s,a)+γV(s′)) Cost for any movement is 0 and an agent can choose to terminate at a numbered grid to collect a reward amount of the grid number. This is why my square closest to the reward takes in the value 8 since it will terminate with the action to the next state to collect the reward. Would this be the correct way to determine the value for the surrounding grid squares? https://preview.redd.it/s9l0ok4kbgt81.png?width=806&format=png&auto=webp&s=dfb50450001541b0569d0361fd04a73daa29f222 submitted by /u/Artezian [link] [comments]  ( 2 min )
  • Open

    Three from MIT awarded 2022 Paul and Daisy Soros Fellowships for New Americans
    Fellowship funds graduate studies for outstanding immigrants and children of immigrants.  ( 6 min )
  • Open

    Latest Research From Stanford Introduces ‘Domino’: A Python Tool for Identifying and Describing Underperforming Slices in Machine Learning Models
    Machine learning and Artificial Intelligence models have gained promising results in recent years. The major factor behind their success is the availability and development of vast datasets. However, regardless of how many terabytes of data you have or how skilled you are at data science, machine learning models will be useless and even dangerous if you can’t make sense of data records. A slice is a collection of data samples with a common feature. For example, in a picture dataset, photographs of antique vehicles make up a slice. When a model’s performance on the data samples in a slice is significantly lower than its overall performance, the slice is considered underperforming. Deploying models underperforming on crucial data slices could seriously harm safety and fairness. For instance, models trained to detect collapsed lungs in chest X-rays generally make predictions based on the presence of chest drains, a common therapeutic device. As a result, computer models typically fail to detect collapsed lungs in images without chest drains, a critical data slice in which inaccurate negative predictions could be catastrophic. Not many studies have considered underperforming slices during model evaluation. Researchers believe that knowing which slices their models underperform would help practitioners not just make better decisions regarding model deployment but also improve model robustness by upgrading the training dataset or utilizing robust optimization strategies. Detecting slices is challenging because the “hidden” data slices are linked by a notion that isn’t easily derived from unstructured inputs or labeled in metadata (e.g., images, video, time-series data). Continue reading the summary Paper: https://arxiv.org/pdf/2203.14960.pdf Article: http://ai.stanford.edu/blog/domino/ Github: https://github.com/HazyResearch/domino submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    NN from Scratch: #3 Forward propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    AI-generated easter eggs
    How would AI decorate an easter egg? I've tried this before by training an image-generating model exclusively on pictures of easter eggs I decorated (they came out plain, if a bit wobbly). I decided to see what I would get using a model based on CLIP, which has  ( 3 min )
    Bonus: What does the x-ray of an Easter egg look like?
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    R-Learning AI self-taking over processes
    An inside look at how REINFORCEMENT learning, without past reference, extracts “optimal” decisions through simple interaction …  ( 10 min )
  • Open

    GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW
    This GFN Thursday delivers more gr-EA-t games as two new titles from Electronic Arts join the GeForce NOW library. Gamers can now enjoy Need for Speed HEAT  and Plants vs. Zombies Garden Warfare 2 streaming from GeForce NOW to underpowered PCs, Macs, Chromebooks, SHIELD TV and mobile devices. It’s all part of the eight  total Read article > The post GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    How AI is Changing Digital Marketing
    What is Artificial Intelligence? Oxford Languages defines AI as the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. For those of us working in the realm of digital marketing, the impact has become even more clear over… Read More »How AI is Changing Digital Marketing The post How AI is Changing Digital Marketing appeared first on Data Science Central.  ( 4 min )
    AI For Compliance: What, Why, How
    With the constant rise and use of technology, Artificial Intelligence (AI) has become a great companion to compliance. Compliance is one of the biggest playing fields and plays a pivotal role in banking institutions. It aims to identify, diminish, and manage risks such as insider trading, spoofing attacks, exploitation of the market, front-running, and more by… Read More »AI For Compliance: What, Why, How The post AI For Compliance: What, Why, How appeared first on Data Science Central.  ( 9 min )
    Benefits of Data Governance and Compliance
    While data compliance is the practice of organizations ensuring that all sensitive data is managed and organized in a way that enables them to meet their business rules alongside legal and governmental regulations, data governance involves the process of managing organizational data’s usability, security, availability, and quality using the internally set rules and policies. Data… Read More »Benefits of Data Governance and Compliance The post Benefits of Data Governance and Compliance appeared first on Data Science Central.  ( 3 min )
    How To Write A Technical Dissertation
    Technical dissertation writing sometimes seems impossible until it is done. A dissertation is among the lengthiest tasks that can take months to get completed. Thus, it exhausts students, but there is no way around it. It is worth more than about 60 credits in a thesis-based degree. Moreover, gathering proper knowledge and top guidelines about… Read More »How To Write A Technical Dissertation The post How To Write A Technical Dissertation appeared first on Data Science Central.  ( 5 min )
    Why Data Engineers are in Greater Demand than Data Scientists
    Globally, many think that data scientist is the best job after Harvard declared it to be one of the hottest jobs of the decade.  And since then, many have been choosing it as their career path. But the role of a data engineer is as important as the data scientist is, because if a data… Read More »Why Data Engineers are in Greater Demand than Data Scientists The post Why Data Engineers are in Greater Demand than Data Scientists appeared first on Data Science Central.  ( 3 min )
  • Open

    When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation. (arXiv:2203.16311v2 [cs.LG] UPDATED)
    Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present a systematic study of post-exploration, answering open questions that the Go-Explore paper did not answer yet. First, we study the isolated potential of post-exploration, by turning it on and off within the same algorithm. Subsequently, we introduce new methodology to adaptively decide when to post-explore and for how long to post-explore. Experiments on a range of MiniGrid environments show that post-exploration indeed boosts performance (with a bigger impact than tuning regular exploration parameters), and this effect is further enhanced by adaptively deciding when and for how long to post-explore. In short, our work identifies adaptive post-exploration as a promising direction for RL exploration research.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
    Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS. (arXiv:2109.01818v2 [cs.LG] UPDATED)
    In recent years, convolutional neural networks (CNNs) have experienced an increasing interest in their ability to perform a fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability prediction consists of permeability labels that are typically generated by classical lattice Boltzmann methods (LBM) that simulate the flow through the pore space of the segmented image data. We instead perform direct numerical simulation (DNS) by solving the stationary Stokes equation in an efficient and distributed-parallel manner. As such, we circumvent the convergence issues of LBM that frequently are observed on complex pore geometries, and therefore, improve the generality and accuracy of our training data set. Using the DNS-computed permeabilities, a physics-informed CNN PhyCNN) is trained by additionally providing a tailored characteristic quantity of the pore space. More precisely, by exploiting the connection to flow problems on a graph representation of the pore space, additional information about confined structures is provided to the network in terms of the maximum flow value, which is the key innovative component of our workflow. The robustness of this approach is reflected by very high prediction accuracy, which is observed for a variety of sandstone samples from archetypal rock formations.
    Highly efficient reliability analysis of anisotropic heterogeneous slopes: Machine Learning aided Monte Carlo method. (arXiv:2204.06098v1 [cs.LG])
    Machine Learning (ML) algorithms are increasingly used as surrogate models to increase the efficiency of stochastic reliability analyses in geotechnical engineering. This paper presents a highly efficient ML aided reliability technique that is able to accurately predict the results of a Monte Carlo (MC) reliability study, and yet performs 500 times faster. A complete MC reliability analysis on anisotropic heterogeneous slopes consisting of 120,000 simulated samples is conducted in parallel to the proposed ML aided stochastic technique. Comparing the results of the complete MC study and the proposed ML aided technique, the expected errors of the proposed method are realistically examined. Circumventing the time-consuming computation of factors of safety for the training datasets, the proposed technique is more efficient than previous methods. Different ML models, including Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN) are presented, optimised and compared. The effects of the size and type of training and testing datasets are discussed. The expected errors of the ML predicted probability of failure are characterised by different levels of soil heterogeneity and anisotropy. Using only 1% of MC samples to train ML surrogate models, the proposed technique can accurately predict the probability of failure with mean errors limited to 0.7%. The proposed technique reduces the computational time required for our study from 306 days to only 14 hours, providing 500 times higher efficiency.
    OntoProtein: Protein Pretraining With Gene Ontology Embedding. (arXiv:2201.11147v3 [q-bio.BM] UPDATED)
    Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.
    A streamable large-scale clinical EEG dataset for Deep Learning. (arXiv:2203.02552v2 [cs.LG] UPDATED)
    Deep Learning has revolutionized various fields, including Computer Vision, Natural Language Processing, as well as Biomedical research. Within the field of neuroscience, specifically in electrophysiological neuroimaging, researchers are starting to explore leveraging deep learning to make predictions on their data without extensive feature engineering. The availability of large-scale datasets is a crucial aspect of allowing the experimentation of Deep Learning models. We are publishing the first large-scale clinical EEG dataset that simplifies data access and management for Deep Learning. This dataset contains eyes-closed EEG data prepared from a collection of 1,574 juvenile participants from the Healthy Brain Network. We demonstrate a use case integrating this framework, and discuss why providing such neuroinformatics infrastructure to the community is critical for future scientific discoveries.
    Research on Intellectual Property Resource Profile and Evolution Law. (arXiv:2204.06221v1 [cs.DL])
    In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing. This makes intellectual property-oriented science and technology resource portraits and analysis of evolution become the current research hotspot. This paper sorts out the construction method of intellectual property resource intellectual portrait and its pre-work property entity extraction and entity completion from the aspects of algorithm classification and general process, and directions for improvement of future methods.
    Adjacency constraint for efficient hierarchical reinforcement learning. (arXiv:2111.00213v3 [cs.LG] UPDATED)
    Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.
    Learning from All Vehicles. (arXiv:2203.11934v2 [cs.RO] UPDATED)
    In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. This system uses the behaviors of other agents to create more diverse driving scenarios without collecting additional data. The main difficulty in learning from other vehicles is that there is no sensor information. We use a set of supervisory tasks to learn an intermediate representation that is invariant to the viewpoint of the controlling vehicle. This not only provides a richer signal at training time but also allows more complex reasoning during inference. Learning how all vehicles drive helps predict their behavior at test time and can avoid collisions. We evaluate this system in closed-loop driving simulations. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points. Our method won the 2021 CARLA Autonomous Driving challenge. Code and data are available at https://github.com/dotchen/LAV.
    Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms. (arXiv:2204.06106v1 [cs.CR])
    Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set. A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples. While this use of DP is a principled approach to limit the efficacy of MI attacks, there is a gap between the bounds provided by DP and the empirical performance of MI attacks. In this paper, we derive bounds for the \textit{advantage} of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism. We further show bounds on the \textit{confidence} of MI attacks. Our bounds are much stronger than those obtained by DP analysis. For example, analyzing a setting of DP-SGD with $\epsilon=4$ would obtain an upper bound on the advantage of $\approx0.36$ based on our analyses, while getting bound of $\approx 0.97$ using the analysis of previous work that convert $\epsilon$ to membership inference bounds. Finally, using our analysis, we provide MI metrics for models trained on CIFAR10 dataset. To the best of our knowledge, our analysis provides the state-of-the-art membership inference bounds for the privacy.
    LDPC codes: comparing cluster graphs to factor graphs. (arXiv:2204.06350v1 [cs.IT])
    We present a comparison study between a cluster and factor graph representation of LDPC codes. In probabilistic graphical models, cluster graphs retain useful dependence between random variables during inference, which are advantageous in terms of computational cost, convergence speed, and accuracy of marginal probabilities. This study investigates these benefits in the context of LDPC codes and shows that a cluster graph representation outperforms the traditional factor graph representation.
    Deep Learning-based Framework for Automatic Cranial Defect Reconstruction and Implant Modeling. (arXiv:2204.06310v1 [eess.IV])
    The goal of this work is to propose a robust, fast, and fully automatic method for personalized cranial defect reconstruction and implant modeling. We propose a two-step deep learning-based method using a modified U-Net architecture to perform the defect reconstruction, and a dedicated iterative procedure to improve the implant geometry, followed by automatic generation of models ready for 3-D printing. We propose a cross-case augmentation based on imperfect image registration combining cases from different datasets. We perform ablation studies regarding different augmentation strategies and compare them to other state-of-the-art methods. We evaluate the method on three datasets introduced during the AutoImplant 2021 challenge, organized jointly with the MICCAI conference. We perform the quantitative evaluation using the Dice and boundary Dice coefficients, and the Hausdorff distance. The average Dice coefficient, boundary Dice coefficient, and the 95th percentile of Hausdorff distance are 0.91, 0.94, and 1.53 mm respectively. We perform an additional qualitative evaluation by 3-D printing and visualization in mixed reality to confirm the implant's usefulness. We propose a complete pipeline that enables one to create the cranial implant model ready for 3-D printing. The described method is a greatly extended version of the method that scored 1st place in all AutoImplant 2021 challenge tasks. We freely release the source code, that together with the open datasets, makes the results fully reproducible. The automatic reconstruction of cranial defects may enable manufacturing personalized implants in a significantly shorter time, possibly allowing one to perform the 3-D printing process directly during a given intervention. Moreover, we show the usability of the defect reconstruction in mixed reality that may further reduce the surgery time.
    Distributionally Robust Models with Parametric Likelihood Ratios. (arXiv:2204.06340v1 [cs.LG])
    As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift. The Distributionally Robust Optimization (DRO) framework proposes to address this issue by training models to minimize their expected risk under a collection of distributions, to imitate test-time shifts. This is most commonly achieved by instance-level re-weighting of the training objective to emulate the likelihood ratio with possible test distributions, which allows for estimating their empirical risk via importance sampling (assuming that they are subpopulations of the training distribution). However, re-weighting schemes in the literature are usually limited due to the difficulty of keeping the optimization problem tractable and the complexity of enforcing normalization constraints. In this paper, we show that three simple ideas -- mini-batch level normalization, a KL penalty and simultaneous gradient updates -- allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and text classification benchmarks, we find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches, and that the method performs reliably well with little hyper-parameter tuning. Code to reproduce our experiments can be found at https://github.com/pmichel31415/P-DRO.
    Learning multiobjective rough terrain traversability. (arXiv:2203.16354v2 [cs.RO] UPDATED)
    We present a method that uses high-resolution topography data of rough terrain, and ground vehicle simulation, to predict traversability. Traversability is expressed as three independent measures: the ability to traverse the terrain at a target speed, energy consumption, and acceleration. The measures are continuous and reflect different objectives for planning that go beyond binary classification. A deep neural network is trained to predict the traversability measures from the local heightmap and target speed. To produce training data, we use an articulated vehicle with wheeled bogie suspensions and procedurally generated terrains. We evaluate the model on laser-scanned forest terrains, previously unseen by the model. The model predicts traversability with an accuracy of 90%. Predictions rely on features from the high-dimensional terrain data that surpass local roughness and slope relative to the heading. Correlations show that the three traversability measures are complementary to each other. With an inference speed 3000 times faster than the ground truth simulation and trivially parallelizable, the model is well suited for traversability analysis and optimal path planning over large areas.
    DL4SciVis: A State-of-the-Art Survey on Deep Learning for Scientific Visualization. (arXiv:2204.06504v1 [cs.GR])
    Since 2016, we have witnessed the tremendous growth of artificial intelligence+visualization (AI+VIS) research. However, existing survey papers on AI+VIS focus on visual analytics and information visualization, not scientific visualization (SciVis). In this paper, we survey related deep learning (DL) works in SciVis, specifically in the direction of DL4SciVis: designing DL solutions for solving SciVis problems. To stay focused, we primarily consider works that handle scalar and vector field data but exclude mesh data. We classify and discuss these works along six dimensions: domain setting, research task, learning type, network architecture, loss function, and evaluation metric. The paper concludes with a discussion of the remaining gaps to fill along the discussed dimensions and the grand challenges we need to tackle as a community. This state-of-the-art survey guides SciVis researchers in gaining an overview of this emerging topic and points out future directions to grow this research.
    Scalable Training of Language Models using JAX pjit and TPUv4. (arXiv:2204.06514v1 [cs.LG])
    Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency improvements coming from adopting new software and hardware solutions.
    Sentiment Analysis of Political Tweets for Israel using Machine Learning. (arXiv:2204.06515v1 [cs.IR])
    Sentiment Analysis is a vital research topic in the field of Computer Science. With the accelerated development of Information Technology and social networks, a massive amount of data related to comment texts has been generated on web applications or social media platforms like Twitter. Due to this, people have actively started proliferating general information and the information related to political opinions, which becomes an important reason for analyzing public reactions. Most researchers have used social media specifics or contents to analyze and predict public opinion concerning political events. This research proposes an analytical study using Israeli political Twitter data to interpret public opinion towards the Palestinian-Israeli conflict. The attitudes of ethnic groups and opinion leaders in the form of tweets are analyzed using Machine Learning algorithms like Support Vector Classifier (SVC), Decision Tree (DT), and Naive Bayes (NB). Finally, a comparative analysis is done based on experimental results from different models.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    COIL: Constrained Optimization in Learned Latent Space -- Learning Representations for Valid Solutions. (arXiv:2202.02163v3 [cs.NE] UPDATED)
    Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. Preliminary experiments show promise: compared to an identical GA using a standard representation that cannot meet the constraints or find fit solutions, COIL with its learned latent representation can perfectly satisfy different types of constraints while finding high-fitness solutions.
    Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection. (arXiv:2203.02194v2 [cs.CV] UPDATED)
    In some scenarios, classifier requires detecting out-of-distribution samples far from its training data. With desirable characteristics, reconstruction autoencoder-based methods deal with this problem by using input reconstruction error as a metric of novelty vs. normality. We formulate the essence of such approach as a quadruplet domain translation with an intrinsic bias to only query for a proxy of conditional data uncertainty. Accordingly, an improvement direction is formalized as maximumly compressing the autoencoder's latent space while ensuring its reconstructive power for acting as a described domain translator. From it, strategies are introduced including semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods, which together establish state-of-the-art performance on various benchmarks, e.g., the FPR@95%TPR of CIFAR-100 vs. TinyImagenet-crop on Wide-ResNet is 0.2%. Importantly, our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
    Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks. (arXiv:2203.11011v2 [cs.IR] UPDATED)
    Massive open online courses (MOOCs), which provide a large-scale interactive participation and open access via the web, are becoming a modish way for online and distance education. To help users have a better study experience, many MOOC platforms have provided the services of recommending courses to users. However, we argue that directly recommending a course to users will ignore the expertise levels of different users. To fill this gap, this paper studies the problem of concept recommendation in a more fine-grained view. We propose a novel Heterogeneous Information Networks based Concept Recommender with Reinforcement Learning (HinCRec-RL) incorporated for concept recommendation in MOOCs. Specifically, we first formulate the concept recommendation in MOOCs as a reinforcement learning problem to better model the dynamic interaction among users and knowledge concepts. In addition, to mitigate the data sparsity issue which also exists in many other recommendation tasks, we consider a heterogeneous information network (HIN) among users, courses, videos and concepts, to better learn the semantic representation of users. In particular, we use the meta-paths on HIN to guide the propagation of users' preferences and propose a heterogeneous graph attention network to represent the meta-paths. To validate the effectiveness of our proposed approach, we conduct comprehensive experiments on a real-world dataset from XuetangX, a popular MOOC platform from China. The promising results show that our proposed approach can outperform other baselines.
    A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. (arXiv:2204.06164v1 [eess.AS])
    In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.
    Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning. (arXiv:2007.13695v3 [eess.SP] UPDATED)
    Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput or spectrum efficiency that it experiences. The proposed solution is evaluated in two settings: using a series of generated environments where we vary the number of BS and building densities, and in a scenario using real-world data obtained from an experiment in Dublin, Ireland. Results show that our proposed RL-based solution improves UAVs QoS by 6% to 41%, depending on the scenario. We also conclude that, when flying at heights higher than the buildings, building density variation has no impact on UAV QoS. On the other hand, BS density can negatively impact UAV QoS, with higher numbers of BSs generating more interference and deteriorating UAV performance.
    QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks. (arXiv:2110.14181v4 [eess.IV] UPDATED)
    Automated segmentation of pathological regions of interest aids medical image diagnostics and follow-up care. However, accurate pathological segmentations require high quality of annotated data that can be both cost and time intensive to generate. In this work, we propose an automated two-step method that detects a minimal image subset required to train segmentation models by evaluating the quality of medical images from 3D image stacks using a U-net++ model. These images that represent a lack of quality training can then be annotated and used to fully train a U-net-based segmentation model. The proposed QU-net++ model detects this lack of quality training based on the disagreement in segmentations produced from the final two output layers. The proposed model isolates around 10% of the slices per 3D image stack and can scale across imaging modalities to segment cysts in OCT images and ground glass opacity (GGO) in lung CT images with Dice scores in the range 0.56-0.72. Thus, the proposed method can be applied for cost effective multi-modal pathology segmentation tasks.
    Why KDAC? A general activation function for knowledge discovery. (arXiv:2111.13858v3 [cs.LG] UPDATED)
    Deep learning oriented named entity recognition (DNER) has gradually become the paradigm of knowledge discovery, which greatly promotes domain intelligence. However, the current activation function of DNER fails to treat gradient vanishing, no negative output or non-differentiable existence, which may impede knowledge exploration caused by the omission and incomplete representation of latent semantics. To break through the dilemma, we present a novel activation function termed KDAC. Detailly, KDAC is an aggregation function with multiple conversion modes. The backbone of the activation region is the interaction between exponent and linearity, and the both ends extend through adaptive linear divergence, which surmounts the obstacle of gradient vanishing and no negative output. Crucially, the non-differentiable points are alerted and eliminated by an approximate smoothing algorithm. KDAC has a series of brilliant properties, including nonlinear, stable near-linear transformation and derivative, as well as dynamic style, etc. We perform experiments based on BERT-BiLSTM-CNN-CRF model on six benchmark datasets containing different domain knowledge, such as Weibo, Clinical, E-commerce, Resume, HAZOP and People's daily. The evaluation results show that KDAC is advanced and effective, and can provide more generalized activation to stimulate the performance of DNER. We hope that KDAC can be exploited as a promising activation function to devote itself to the construction of knowledge.
    Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability. (arXiv:2204.06425v1 [cs.SE])
    Machine learning models have been widely developed, released, and adopted in numerous applications. Meanwhile, the documentation practice for machine learning models often falls short of established practices for traditional software components, which impedes model accountability, inadvertently abets inappropriate or misuse of models, and may trigger negative social impact. Recently, model cards, a template for documenting machine learning models, have attracted notable attention, but their impact on the practice of model documentation is unclear. In this work, we examine publicly available model cards and other similar documentation. Our analysis reveals a substantial gap between the suggestions made in the original model card work and the content in actual documentation. Motivated by this observation and literature on fields such as software documentation, interaction design, and traceability, we further propose a set of design guidelines that aim to support the documentation practice for machine learning models including (1) the collocation of documentation environment with the coding environment, (2) nudging the consideration of model card sections during model development, and (3) documentation derived from and traced to the source. We designed a prototype tool named DocML following those guidelines to support model development in computational notebooks. A lab study reveals the benefit of our tool to shift the behavior of data scientists towards documentation quality and accountability.
    Modelling Evolutionary and Stationary User Preferences for Temporal Sets Prediction. (arXiv:2204.05490v2 [cs.LG] UPDATED)
    Given a sequence of sets, where each set is associated with a timestamp and contains an arbitrary number of elements, the task of temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly capture each user's evolutionary preference by learning from his/her own sequence. Although insightful, we argue that: 1) the collaborative signals latent in different users' sequences are essential but have not been exploited; 2) users also tend to show stationary preferences while existing methods fail to consider. To this end, we propose an integrated learning framework to model both the evolutionary and the stationary preferences of users for temporal sets prediction, which first constructs a universal sequence by chronologically arranging all the user-set interactions, and then learns on each user-set interaction. In particular, for each user-set interaction, we first design an evolutionary user preference modelling component to track the user's time-evolving preference and exploit the latent collaborative signals among different users. This component maintains a memory bank to store memories of the related user and elements, and continuously updates their memories based on the currently encoded messages and the past memories. Then, we devise a stationary user preference modelling module to discover each user's personalized characteristics according to the historical sequence, which adaptively aggregates the previously interacted elements from dual perspectives with the guidance of the user's and elements' embeddings. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x training speedups on average. Experiments on real-world datasets demonstrate the effectiveness and good interpretability of our approach.
    AHP: Learning to Negative Sample for Hyperedge Prediction. (arXiv:2204.06353v1 [cs.LG])
    Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implication for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.
    Label Augmentation with Reinforced Labeling for Weak Supervision. (arXiv:2204.06436v1 [cs.LG])
    Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs) instead of hand-labeling each data point. However, the existing approach fails to fully exploit the domain knowledge encoded into LFs, especially when the LFs' coverage is low. This is due to the common data programming pipeline that neglects to utilize data features during the generative process. This paper proposes a new approach called reinforced labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Thus, RL can lead to higher labeling coverage for training an end classifier. The experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains. The new approach produces significant performance improvement, leading up to +21 points in accuracy and +61 points in F1 scores compared to the state-of-the-art data programming approach.
    Clinical trial site matching with improved diversity using fair policy learning. (arXiv:2204.06501v1 [cs.LG])
    The ongoing pandemic has highlighted the importance of reliable and efficient clinical trials in healthcare. Trial sites, where the trials are conducted, are chosen mainly based on feasibility in terms of medical expertise and access to a large group of patients. More recently, the issue of diversity and inclusion in clinical trials is gaining importance. Different patient groups may experience the effects of a medical drug/ treatment differently and hence need to be included in the clinical trials. These groups could be based on ethnicity, co-morbidities, age, or economic factors. Thus, designing a method for trial site selection that accounts for both feasibility and diversity is a crucial and urgent goal. In this paper, we formulate this problem as a ranking problem with fairness constraints. Using principles of fairness in machine learning, we learn a model that maps a clinical trial description to a ranked list of potential trial sites. Unlike existing fairness frameworks, the group membership of each trial site is non-binary: each trial site may have access to patients from multiple groups. We propose fairness criteria based on demographic parity to address such a multi-group membership scenario. We test our method on 480 real-world clinical trials and show that our model results in a list of potential trial sites that provides access to a diverse set of patients while also ensuing a high number of enrolled patients.
    Distilling the Knowledge of Romanian BERTs Using Multiple Teachers. (arXiv:2112.12650v3 [cs.CL] UPDATED)
    Running large-scale pre-trained language models in computationally constrained environments remains a challenging problem yet to be addressed, while transfer learning from these models has become prevalent in Natural Language Processing tasks. Several solutions, including knowledge distillation, network quantization, or network pruning have been previously proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base, and DistilMulti-BERT-base-ro. The first two models resulted from the individual distillation of knowledge from two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity, and dialect identification. Our experimental results argue that the three distilled models offer performance comparable to their teachers, while being twice as fast on a GPU and ~35% smaller. In addition, we further test the similarity between the predictions of our students versus their teachers by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.
    Estimation of stellar atmospheric parameters from LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. (arXiv:2204.06301v1 [astro-ph.GA])
    The accuracy of the estimated stellar atmospheric parameter decreases evidently with the decreasing of spectral signal-to-noise ratio (SNR) and there are a huge amount of this kind observations, especially in case of SNR$<$30. Therefore, it is helpful to improve the parameter estimation performance for these spectra and this work studied the ($T_\texttt{eff}, \log~g$, [Fe/H]) estimation problem for LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. We proposed a data-driven method based on machine learning techniques. Firstly, this scheme detected stellar atmospheric parameter-sensitive features from spectra by the Least Absolute Shrinkage and Selection Operator (LASSO), rejected ineffective data components and irrelevant data. Secondly, a Multi-layer Perceptron (MLP) method was used to estimate stellar atmospheric parameters from the LASSO features. Finally, the performance of the LASSO-MLP was evaluated by computing and analyzing the consistency between its estimation and the reference from the APOGEE (Apache Point Observatory Galactic Evolution Experiment) high-resolution spectra. Experiments show that the Mean Absolute Errors (MAE) of $T_\texttt{eff}, \log~g$, [Fe/H] are reduced from the LASP (137.6 K, 0.195 dex, 0.091 dex) to LASSO-MLP (84.32 K, 0.137 dex, 0.063 dex), which indicate evident improvements on stellar atmospheric parameter estimation. In addition, this work estimated the stellar atmospheric parameters for 1,162,760 low-resolution spectra with 20$\leq$SNR$<$30 from LAMOST DR8 using LASSO-MLP, and released the estimation catalog, learned model, experimental code, trained model, training data and test data for scientific exploration and algorithm study.
    Autonomy and Perception for Space Mining. (arXiv:2109.12109v3 [cs.RO] UPDATED)
    Future Moon bases will likely be constructed using resources mined from the surface of the Moon. The difficulty of maintaining a human workforce on the Moon and communications lag with Earth means that mining will need to be conducted using collaborative robots with a high degree of autonomy. In this paper, we describe our solution for Phase 2 of the NASA Space Robotics Challenge, which provided a simulated lunar environment in which teams were tasked to develop software systems to achieve autonomous collaborative robots for mining on the Moon. Our 3rd place and innovation award winning solution shows how machine learning-enabled vision could alleviate major challenges posed by the lunar environment towards autonomous space mining, chiefly the lack of satellite positioning systems, hazardous terrain, and delicate robot interactions. A robust multi-robot coordinator was also developed to achieve long-term operation and effective collaboration between robots.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Disentangling Autoencoders (DAE). (arXiv:2202.09926v2 [cs.LG] UPDATED)
    Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations in group-theory. To the best of our knowledge, this is the first deterministic model that is aiming to achieve disentanglement based on autoencoders without regularizers. The proposed model is compared to seven state-of-the-art generative models based on autoencoders and evaluated based on five supervised disentanglement metrics. The experimental results show that the proposed model can have better disentanglement when variances of each features are different. We believe that this model leads to a new field for disentanglement learning based on autoencoders without regularizers.
    Reinforcement Learning on Graph: A Survey. (arXiv:2204.06127v1 [cs.LG])
    Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training. (arXiv:2204.06322v1 [eess.AS])
    We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering strategy based on user-feedback signals for federated distillation. These techniques created models that significantly improved quality metrics in offline evaluations and user-experience metrics in live A/B experiments.
    Modeling and Analysis of Intermittent Federated Learning Over Cellular-Connected UAV Networks. (arXiv:2110.07077v3 [cs.LG] UPDATED)
    Federated learning (FL) is a promising distributed learning technique particularly suitable for wireless learning scenarios since it can accomplish a learning task without raw data transportation so as to preserve data privacy and lower network resource consumption. However, current works on FL over wireless networks do not profoundly study the fundamental performance of FL over wireless networks that suffers from communication outage due to channel impairment and network interference. To accurately exploit the performance of FL over wireless networks, this paper proposes a novel intermittent FL model over a cellular-connected unmanned aerial vehicle (UAV) network, which characterizes communication outage from UAV (clients) to their server and data heterogeneity among the datasets at UAVs. We propose an analytically tractable framework to derive the uplink outage probability and use it to devise a simulation-based approach so as to evaluate the performance of the proposed intermittent FL model. Our findings reveal how the intermittent FL model is impacted by uplink communication outage and UAV deployment. Extensive numerical simulations are provided to show the consistency between the simulated and analytical performances of the proposed intermittent FL model.
    FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning. (arXiv:2204.05562v2 [cs.LG] UPDATED)
    The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.
    Baseline Computation for Attribution Methods Based on Interpolated Inputs. (arXiv:2204.06120v1 [cs.CV])
    We discuss a way to find a well behaved baseline for attribution methods that work by feeding a neural network with a sequence of interpolated inputs between two given inputs. Then, we test it with our novel Riemann-Stieltjes Integrated Gradient-weighted Class Activation Mapping (RSI-Grad-CAM) attribution method.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Organization of a Latent Space structure in VAE/GAN trained by navigation data. (arXiv:2102.01852v3 [cs.LG] UPDATED)
    We present a novel artificial cognitive mapping system using generative deep neural networks, called variational autoencoder/generative adversarial network (VAE/GAN), which can map input images to latent vectors and generate temporal sequences internally. The results show that the distance of the predicted image is reflected in the distance of the corresponding latent vector after training. This indicates that the latent space is self-organized to reflect the proximity structure of the dataset and may provide a mechanism through which many aspects of cognition are spatially represented. The present study allows the network to internally generate temporal sequences that are analogous to the hippocampal replay/pre-play ability, where VAE produces only near-accurate replays of past experiences, but by introducing GANs, the generated sequences are coupled with instability and novelty.
    FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. (arXiv:2204.06508v1 [cs.CL])
    Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications. Recent works have shown promising improvements in factuality error identification using text or dependency arc entailments; however, they do not consider the entire semantic graph simultaneously. To this end, we propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MR), which are more suitable for factuality evaluation. MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity. FactGraph encodes such graphs using a graph encoder augmented with structure-aware adapters to capture interactions among the concepts based on the graph connectivity, along with text representations using an adapter-based text encoder. Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%. Furthermore, FactGraph improves performance on identifying content verifiability errors and better captures subsentence-level factual inconsistencies.
    Challenges and Opportunities of Edge AI for Next-Generation Implantable BMIs. (arXiv:2204.02362v2 [cs.AI] UPDATED)
    Neuroscience and neurotechnology are currently being revolutionized by artificial intelligence (AI) and machine learning. AI is widely used to study and interpret neural signals (analytical applications), assist people with disabilities (prosthetic applications), and treat underlying neurological symptoms (therapeutic applications). In this brief, we will review the emerging opportunities of on-chip AI for the next-generation implantable brain-machine interfaces (BMIs), with a focus on state-of-the-art prosthetic BMIs. Major technological challenges for the effectiveness of AI models will be discussed. Finally, we will present algorithmic and IC design solutions to enable a new generation of AI-enhanced and high-channel-count BMIs.
    On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance. (arXiv:2204.06122v1 [cs.SI])
    For more than a half-century, credit risk management has used credit scoring models in each of its well-defined stages to manage credit risk. Application scoring is used to decide whether to grant a credit or not, while behavioral scoring is used mainly for portfolio management and to take preventive actions in case of default signals. In both cases, network data has recently been shown to be valuable to increase the predictive power of these models, especially when the borrower's historical data is scarce or not available. This study aims to understand the creditworthiness assessment performance dynamics and how it is influenced by the credit history, repayment behavior, and social network features. To accomplish this, we introduced a machine learning classification framework to analyze 97.000 individuals and companies from the moment they obtained their first loan to 12 months afterward. Our novel and massive dataset allow us to characterize each borrower according to their credit behavior, and social and economic relationships. Our research shows that borrowers' history increases performance at a decreasing rate during the first six months and then stabilizes. The most notable effect on perfomance of social networks features occurs at loan application; in personal scoring, this effect prevails a few months, while in business scoring adds value throughout the study period. These findings are of great value to improve credit risk management and optimize the use of traditional information and alternative data sources.
    Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation. (arXiv:2204.06439v1 [cs.SD])
    Speech dereverberation is often an important requirement in robust speech processing tasks. Supervised deep learning (DL) models give state-of-the-art performance for single-channel speech dereverberation. Temporal convolutional networks (TCNs) are commonly used for sequence modelling in speech enhancement tasks. A feature of TCNs is that they have a receptive field (RF) dependant on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame. It has been shown that TCNs are capable of performing dereverberation of simulated speech data, however a thorough analysis, especially with focus on the RF is yet lacking in the literature. This paper analyses dereverberation performance depending on the model size and the RF of TCNs. Experiments using the WHAMR corpus which is extended to include room impulse responses (RIRs) with larger T60 values demonstrate that a larger RF can have significant improvement in performance when training smaller TCN models. It is also demonstrated that TCNs benefit from a wider RF when dereverberating RIRs with larger RT60 values.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    A pipeline and comparative study of 12 machine learning models for text classification. (arXiv:2204.06518v1 [cs.IR])
    Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Keys to Accurate Feature Extraction Using Residual Spiking Neural Networks. (arXiv:2111.05955v3 [cs.LG] UPDATED)
    Spiking neural networks (SNNs) have become an interesting alternative to conventional artificial neural networks (ANN) thanks to their temporal processing capabilities and energy efficient implementations in neuromorphic hardware. However the challenges involved in training SNNs have limited their performance in terms of accuracy and thus their applications. Improving learning algorithms and neural architectures for a more accurate feature extraction is therefore one of the current priorities in SNN research. In this paper we present a study on the key components of modern spiking architectures. We empirically compare different techniques in image classification datasets taken from the best performing networks. We design a spiking version of the successful residual network architecture and provide an in-depth study on the possible implementations of spiking residual connections. Our results provide a state of the art guide to SNN design, which allows to make informed choices when trying to build the optimal visual feature extractor. Finally, our network outperforms previous SNN architectures in CIFAR-10 (94.14%) and CIFAR-100 (74.65%) datasets and matches the state of the art in DVS-CIFAR10 (72.98%), with less parameters than the previous state of the art and without the need for ANN-SNN conversion. Code available at https://github.com/VicenteAlex/Spiking_ResNet
    Inspection-L: A Self-Supervised GNN-Based Money Laundering Detection System for Bitcoin. (arXiv:2203.10465v2 [cs.CR] UPDATED)
    Criminals have become increasingly experienced in using cryptocurrencies, such as Bitcoin, for money laundering. The use of cryptocurrencies can hide criminal identities and transfer hundreds of millions of dollars of dirty funds through their criminal digital wallets. However, this is considered a paradox because cryptocurrencies are gold mines for open-source intelligence, allowing law enforcement agencies to have more power in conducting forensic analyses. This paper proposed Inspection-L, a graph neural network (GNN) framework based on self-supervised Deep Graph Infomax (DGI), with supervised learning algorithms, namely Random Forest (RF) to detect illicit transactions for AML. To the best of our knowledge, our proposal is the first of applying self-supervised GNNs to the problem of AML in Bitcoin. The proposed method has been evaluated on the Elliptic dataset and shows that our approach outperforms the baseline in terms of key classification metrics, which demonstrates the potential of self-supervised GNN in cryptocurrency illicit transaction detection.
    Meaningful machine learning models and machine-learned pharmacophores from fragment screening campaigns. (arXiv:2204.06348v1 [q-bio.BM])
    Machine learning (ML) is widely used in drug discovery to train models that predict protein-ligand binding. These models are of great value to medicinal chemists, in particular if they provide case-specific insight into the physical interactions that drive the binding process. In this study we derive ML models from over 50 fragment-screening campaigns to introduce two important elements that we believe are absent in most -- if not all -- ML studies of this type reported to date: First, alongside the observed hits we use to train our models, we incorporate true misses and show that these experimentally validated negative data are of significant importance to the quality of the derived models. Second, we provide a physically interpretable and verifiable representation of what the ML model considers important for successful binding. This representation is derived from a straightforward attribution procedure that explains the prediction in terms of the (inter-)action of chemical environments. Critically, we validate the attribution outcome on a large scale against prior annotations made independently by expert molecular modellers. We find good agreement between the key molecular substructures proposed by the ML model and those assigned manually, even when the model's performance in discriminating hits from misses is far from perfect. By projecting the attribution onto predefined interaction prototypes (pharmacophores), we show that ML allows us to formulate simple rules for what drives fragment binding against a target automatically from screening data.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v4 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. (arXiv:2204.06471v1 [cs.LG])
    In this paper we present a hybrid neural network augmented physics-based modeling (APBM) framework for Bayesian nonlinear latent space estimation. The proposed APBM strategy allows for model adaptation when new operation conditions come into play or the physics-based model is insufficient (or incomplete) to properly describe the latent phenomenon. One advantage of the APBMs and our estimation procedure is the capability of maintaining the physical interpretability of estimated states. Furthermore, we propose a constraint filtering approach to control the neural network contributions to the overall model. We also exploit assumed density filtering techniques and cubature integration rules to present a flexible estimation strategy that can easily deal with nonlinear models and high-dimensional latent spaces. Finally, we demonstrate the efficacy of our methodology by leveraging a target tracking scenario with nonlinear and incomplete measurement and acceleration models, respectively.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning. (arXiv:2204.06509v1 [cs.LG])
    When learning to act in a stochastic, partially observable environment, an intelligent agent should be prepared to anticipate a change in its belief of the environment state, and be capable of adapting its actions on-the-fly to changing conditions. As humans, we are able to form contingency plans when learning a task with the explicit aim of being able to correct errors in the initial control, and hence prove useful if ever there is a sudden change in our perception of the environment which requires immediate corrective action. This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount, and a strong ability to react to a changing belief about the environment is truly needed. In this paper we explore an end-to-end approach, from training to execution, for learning robust contingency plans and combining them with a hierarchical planner to obtain a robust agent policy in an autonomous navigation task where other vehicles' behaviours are unknown, and the agent's belief about these behaviours is subject to sudden, last-second change. We show that our approach results in robust, safe behaviour in a partially observable, stochastic environment, generalizing well over environment dynamics not seen during training.
    Massive MIMO Beam Management in Sub-6 GHz 5G NR. (arXiv:2204.06064v1 [eess.SP])
    Beam codebooks are a new feature of massive multiple-input multiple-output (M-MIMO) in 5G new radio (NR). Codebooks comprised of beamforming vectors are used to transmit reference signals and obtain limited channel state information (CSI) from receivers via the codeword index. This enables large arrays that cannot otherwise obtain sufficient CSI. The performance, however, is limited by the codebook design. In this paper, we show that machine learning can be used to train site-specific codebooks for initial access. We design a neural network based on an autoencoder architecture that uses a beamspace observation in combination with RF environment characteristics to improve the synchronization signal (SS) burst codebook. We test our algorithm using a flexible dataset of channels generated from QuaDRiGa. The results show that our model outperforms the industry standard (DFT beams) and approaches the optimal performance (perfect CSI and singular value decomposition (SVD)-based beamforming), using only a few bits of feedback.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Approximation of Lipschitz Functions using Deep Spline Neural Networks. (arXiv:2204.06233v1 [cs.LG])
    Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.
    Out-of-distribution Detection with Deep Nearest Neighbors. (arXiv:2204.06507v1 [cs.LG])
    Out-of-distribution (OOD) detection is a critical task for deploying machine learning models in the open world. Distance-based methods have demonstrated promise, where testing samples are detected as OOD if they are relatively far away from in-distribution (ID) data. However, prior methods impose a strong distributional assumption of the underlying feature space, which may not always hold. In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. Unlike prior works, our method does not impose any distributional assumption, hence providing stronger flexibility and generality. We demonstrate the effectiveness of nearest-neighbor-based OOD detection on several benchmarks and establish superior performance. Under the same model trained on ImageNet-1k, our method substantially reduces the false positive rate (FPR@TPR95) by 24.77% compared to a strong baseline SSD+, which uses a parametric approach Mahalanobis distance in detection.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. (arXiv:2201.07284v5 [cs.LG] UPDATED)
    Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines.
    ADASYN-Random Forest Based Intrusion Detection Model. (arXiv:2105.04301v5 [cs.CR] UPDATED)
    Intrusion detection has been a key topic in the field of cyber security, and the common network threats nowadays have the characteristics of varieties and variation. Considering the serious imbalance of intrusion detection datasets will result in low classification performance on attack behaviors of small sample size and difficulty to detect network attacks accurately and efficiently, using Adaptive Synthetic Sampling (ADASYN) method to balance datasets was proposed in this paper. In addition, Random Forest algorithm was used to train intrusion detection classifiers. Through the comparative experiment of Intrusion detection on CICIDS 2017 dataset, it is found that ADASYN with Random Forest performs better. Based on the experimental results, the improvement of precision, recall, F1 scores and AUC values after ADASYN is then analyzed. Experiments show that the proposed method can be applied to intrusion detection with large data, and can effectively improve the classification accuracy of network attack behaviors. Compared with traditional machine learning models, it has better performance, generalization ability and robustness.
    Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency. (arXiv:2112.06054v3 [cs.LG] UPDATED)
    Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.
    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation. (arXiv:2204.06517v1 [cs.IR])
    User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user's future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
    A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic. (arXiv:2204.06362v1 [cs.LG])
    The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.
    Local and global topological complexity measures OF ReLU neural network functions. (arXiv:2204.06062v1 [math.AT])
    We apply a generalized piecewise-linear (PL) version of Morse theory due to Grunert-Kuhnel-Rote to define and study new local and global notions of topological complexity for fully-connected feedforward ReLU neural network functions, F: R^n -> R. Along the way, we show how to construct, for each such F, a canonical polytopal complex K(F) and a deformation retract of the domain onto K(F), yielding a convenient compact model for performing calculations. We also give a combinatorial description of local complexity for depth 2 networks, and a construction showing that local complexity can be arbitrarily high.
    Do We Need Anisotropic Graph Neural Networks?. (arXiv:2104.01481v4 [cs.LG] UPDATED)
    Common wisdom in the graph neural network (GNN) community dictates that anisotropic models -- in which messages sent between nodes are a function of both the source and target node -- are required to achieve state-of-the-art performance. Benchmarks to date have demonstrated that these models perform better than comparable isotropic models -- where messages are a function of the source node only. In this work we provide empirical evidence challenging this narrative: we propose an isotropic GNN, which we call Efficient Graph Convolution (EGC), that consistently outperforms comparable anisotropic models, including the popular GAT or PNA architectures by using spatially-varying adaptive filters. In addition to raising important questions for the GNN community, our work has significant real-world implications for efficiency. EGC achieves higher model accuracy, with lower memory consumption and latency, along with characteristics suited to accelerator implementation, while being a drop-in replacement for existing architectures. As an isotropic model, it requires memory proportional to the number of vertices in the graph ($\mathcal{O}(V)$); in contrast, anisotropic models require memory proportional to the number of edges ($\mathcal{O}(E)$). We demonstrate that EGC outperforms existing approaches across 6 large and diverse benchmark datasets, and conclude by discussing questions that our work raise for the community going forward. Code and pretrained models for our experiments are provided at https://github.com/shyam196/egc.
    Slope stability predictions on spatially variable random fields using machine learning surrogate models. (arXiv:2204.06097v1 [cs.LG])
    Random field Monte Carlo (MC) reliability analysis is a robust stochastic method to determine the probability of failure. This method, however, requires a large number of numerical simulations demanding high computational costs. This paper explores the efficiency of different machine learning (ML) algorithms used as surrogate models trained on a limited number of random field slope stability simulations in predicting the results of large datasets. The MC data in this paper require only the examination of failure or non-failure, circumventing the time-consuming calculation of factors of safety. An extensive dataset is generated, consisting of 120,000 finite difference MC slope stability simulations incorporating different levels of soil heterogeneity and anisotropy. The Bagging Ensemble, Random Forest and Support Vector classifiers are found to be the superior models for this problem amongst 9 different models and ensemble classifiers. Trained only on 0.47% of data (500 samples), the ML model can classify the entire 120,000 samples with an accuracy of %85 and AUC score of %91. The performance of ML methods in classifying the random field slope stability results generally reduces with higher anisotropy and heterogeneity of soil. The ML assisted MC reliability analysis proves a robust stochastic method where errors in the predicted probability of failure using %5 of MC data is only %0.46 in average. The approach reduced the computational time from 306 days to less than 6 hours.
    Conditional Gradients for the Approximately Vanishing Ideal. (arXiv:2202.03349v6 [cs.LG] UPDATED)
    The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.
    Deep Annotation of Therapeutic Working Alliance in Psychotherapy. (arXiv:2204.05522v1 [q-bio.NC] CROSS LISTED)
    The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the psychotherapy sessions in a turn-level resolution with deep embeddings such as the Doc2Vec and SentenceBERT models. The transcript of each psychotherapy session can be transcribed and generated in real-time from the session speech recordings, and these embedded dialogues are compared with the distributed representations of the statements in the working alliance inventory. We demonstrate, in a real-world dataset with over 950 sessions of psychotherapy treatments in anxiety, depression, schizophrenia and suicidal patients, the effectiveness of this method in mapping out trajectories of patient-therapist alignment and the interpretability that can offer insights in clinical psychiatry. We believe such a framework can be provide timely feedback to the therapist regarding the quality of the conversation in interview sessions.
    Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs. (arXiv:2204.06255v1 [cs.LG])
    Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics. Neural Operators, generations of neural networks with capability of learning maps between infinite-dimensional spaces, are strong tools for solving parametric PDEs. However, they lack the ability to modeling SPDEs which usually have poor regularity due to the driving noise. As the theory of regularity structure has achieved great successes in analyzing SPDEs and provides the concept model feature vectors that well-approximate SPDEs' solutions, we propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs. We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d stochastic Navier-Stokes equation, and the results demonstrate that the NORS is resolution-invariant, efficient, and achieves one order of magnitude lower error with a modest amount of data.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Is Speech Pathology a Biomarker in Automatic Speaker Verification?. (arXiv:2204.06450v1 [cs.SD])
    With the advancements in deep learning (DL) and an increasing interest in data-driven speech processing methods, a major challenge for speech data scientists in the healthcare domain is the anonymization of pathological speech, which is a required step to be able to make them accessible as a public training resource. In this paper, we investigate pathological speech data and compare their speaker verifiability with that of healthy individuals. We utilize a large pathological speech corpus of more than 2,000 test subjects with various speech and voice disorders from different ages and apply DL-based automatic speaker verification (ASV) techniques. As a result, we obtained a mean equal error rate (EER) of 0.86% with a standard deviation of 0.16%, which is a factor of three lower than comparable healthy speech databases. We further perform detailed analyses of external influencing factors on ASV such as age, pathology, recording environment, and utterance length, to explore their respective effect. Our findings indicate that speech pathology is a potential biomarker in ASV. This is potentially of high interest for the anonymization of pathological speech data.
    Flexible Multiple-Objective Reinforcement Learning for Chip Placement. (arXiv:2204.06407v1 [cs.LG])
    Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changing requirements as they arise. This paper proposes flexible multiple-objective reinforcement learning (MORL) to support objective functions with inference-time variable weights using just a single pretrained model. Our macro placement results show that MORL can generate the Pareto frontier of multiple objectives effectively.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.  ( 2 min )
    CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU. (arXiv:2204.06240v1 [cs.LG])
    The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/zhengzangw/LargeBatchCTR.
    Large-scale multi-objective influence maximisation with network downscaling. (arXiv:2204.06250v1 [cs.SI])
    Finding the most influential nodes in a network is a computationally hard problem with several possible applications in various kinds of network-based problems. While several methods have been proposed for tackling the influence maximisation (IM) problem, their runtime typically scales poorly when the network size increases. Here, we propose an original method, based on network downscaling, that allows a multi-objective evolutionary algorithm (MOEA) to solve the IM problem on a reduced scale network, while preserving the relevant properties of the original network. The downscaled solution is then upscaled to the original network, using a mechanism based on centrality metrics such as PageRank. Our results on eight large networks (including two with $\sim$50k nodes) demonstrate the effectiveness of the proposed method with a more than 10-fold runtime gain compared to the time needed on the original network, and an up to $82\%$ time reduction compared to CELF.
    DT2CAM: A Decision Tree to Content Addressable Memory Framework. (arXiv:2204.06114v1 [cs.AR])
    Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact implementation and enables an efficient bijective mapping to Ternary Content Addressable Memories while maintaining high inference accuracies. In addition, a Resistive-CAM (ReCAM) functional synthesizer is developed for mapping the decision tree to the ReCAM and performing functional simulations for energy, latency, and accuracy evaluations. We study the decision tree accuracy under hardware non-idealities including device defects, manufacturing variability, and input encoding noise. We test our framework on various DT datasets including \textit{Give Me Some Credit}, \textit{Titanic}, and \textit{COVID-19}. Our results reveal up to {42.4\%} energy savings and up to 17.8x better energy-delay-area product compared to the state-of-art hardware accelerators, and up to 333 million decisions per sec for the pipelined implementation.
    Learnable Hypergraph Laplacian for Hypergraph Learning. (arXiv:2106.06666v2 [cs.LG] UPDATED)
    Hypergraph Convolutional Neural Networks (HGCNNs) have demonstrated their potential in modeling high-order relations preserved in graph-structured data. However, most existing convolution filters are localized and determined by the pre-defined initial hypergraph topology, neglecting to explore implicit and long-range relations in real-world data. In this paper, we propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD), which serves as a generic plug-and-play module for improving the representational power of HGCNNs.Specifically, HERALD adaptively optimizes the adjacency relationship between vertices and hyperedges in an end-to-end manner and thus the task-aware hypergraph is learned. Furthermore, HERALD employs the self-attention mechanism to capture the non-local paired-nodes relation. Extensive experiments on various popular hypergraph datasets for node classification and graph classification tasks demonstrate that our approach obtains consistent and considerable performance enhancement, proving its effectiveness and generalization ability.
    InCoder: A Generative Model for Code Infilling and Synthesis. (arXiv:2204.05999v1 [cs.SE])
    Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
    An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks. (arXiv:2201.11440v2 [cs.CV] UPDATED)
    Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated significant performance gain close to Stacking, which resulted in an F1-score increase up to +11%. Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.
    Decentralized Collaborative Learning Framework for Next POI Recommendation. (arXiv:2204.06516v1 [cs.IR])
    Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy.  ( 2 min )
    Negative Sampling for Recommendation. (arXiv:2204.06520v1 [cs.IR])
    How to effectively sample high-quality negative instances is important for well training a recommendation model. We argue that a high-quality negative should be both \textit{informativeness} and \textit{unbiasedness}. Although previous studies have proposed some approaches to address the informativeness in negative sampling, few has been done to discriminating false negative from true negative for unbiased negative sampling, not to mention taking both into consideration. This paper first adopts a parameter learning perspective to analyze negative informativeness and unbiasedness in loss gradient-based model training. We argue that both negative sampling and collaborative filtering include an implicit task of negative classification, from which we report an insightful yet beneficial finding about the order relation in predicted negatives' scores. Based on our finding and by regarding negatives as random variables, we next derive the class condition density of true negatives and that of false negatives. We also design a Bayesian classifier for negative classification, from which we define a quantitative unbiasedness measure for negatives. Finally, we propose to use a harmonic mean of informativeness and unbiasedness to sample high-quality negatives. Experimental studies validate the superiority of our negative sampling algorithm over the peers in terms of better sampling quality and better recommendation performance.  ( 2 min )
    VisCUIT: Visual Auditor for Bias in CNN Image Classifier. (arXiv:2204.05899v2 [cs.CV] UPDATED)
    CNN image classifiers are widely used, thanks to their efficiency and accuracy. However, they can suffer from biases that impede their practical applications. Most existing bias investigation techniques are either inapplicable to general image classification tasks or require significant user efforts in perusing all data subgroups to manually specify which data attributes to inspect. We present VisCUIT, an interactive visualization system that reveals how and why a CNN classifier is biased. VisCUIT visually summarizes the subgroups on which the classifier underperforms and helps users discover and characterize the cause of the underperformances by revealing image concepts responsible for activating neurons that contribute to misclassifications. VisCUIT runs in modern browsers and is open-source, allowing people to easily access and extend the tool to other model architectures and datasets. VisCUIT is available at the following public demo link: https://poloclub.github.io/VisCUIT. A video demo is available at https://youtu.be/eNDbSyM4R_4.  ( 2 min )
    Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach. (arXiv:2204.06390v1 [cs.IT])
    Coverage and capacity are the important metrics for performance evaluation in wireless networks, while the coverage and capacity have several conflicting relationships, e.g. high transmit power contributes to large coverage but high inter-cell interference reduces the capacity performance. Therefore, in order to strike a balance between the coverage and capacity, a novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. To solve the coverage and capacity optimization (CCO) problem, a machine learning-based multi-objective optimization algorithm, i.e., the multi-objective proximal policy optimization (MO-PPO) algorithm, is proposed. In this algorithm, a loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.  ( 2 min )
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.  ( 2 min )
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.  ( 2 min )
    Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification. (arXiv:2204.06305v1 [cs.CL])
    Prompt-based learning (i.e., prompting) is an emerging paradigm for exploiting knowledge learned by a pretrained language model. In this paper, we propose Automatic Multi-Label Prompting (AMuLaP), a simple yet effective method to automatically select label mappings for few-shot text classification with prompting. Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template. Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.  ( 2 min )
    Active Diffusion and VCA-Assisted Image Segmentation of Hyperspectral Images. (arXiv:2204.06298v1 [cs.CV])
    Hyperspectral images encode rich structure that can be exploited for material discrimination by machine learning algorithms. This article introduces the Active Diffusion and VCA-Assisted Image Segmentation (ADVIS) for active material discrimination. ADVIS selects high-purity, high-density pixels that are far in diffusion distance (a data-dependent metric) from other high-purity, high-density pixels in the hyperspectral image. The ground truth labels of these pixels are queried and propagated to the rest of the image. The ADVIS active learning algorithm is shown to strongly outperform its fully unsupervised clustering algorithm counterpart, suggesting that the incorporation of a very small number of carefully-selected ground truth labels can result in substantially superior material discrimination in hyperspectral images.  ( 2 min )
    Receding Neuron Importances for Structured Pruning. (arXiv:2204.06404v1 [cs.LG])
    Structured pruning efficiently compresses networks by identifying and removing unimportant neurons. While this can be elegantly achieved by applying sparsity-inducing regularisation on BatchNorm parameters, an L1 penalty would shrink all scaling factors rather than just those of superfluous neurons. To tackle this issue, we introduce a simple BatchNorm variation with bounded scaling parameters, based on which we design a novel regularisation term that suppresses only neurons with low importance. Under our method, the weights of unnecessary neurons effectively recede, producing a polarised bimodal distribution of importances. We show that neural networks trained this way can be pruned to a larger extent and with less deterioration. We one-shot prune VGG and ResNet architectures at different ratios on CIFAR and ImagenNet datasets. In the case of VGG-style networks, our method significantly outperforms existing approaches particularly under a severe pruning regime.  ( 2 min )
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.  ( 2 min )
    L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models. (arXiv:2204.06029v1 [cs.CL])
    Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people of Maharashtra state. Marathi is a low resource language and still lacks useful NER resources. We present L3Cube-MahaNER, the first major gold standard named entity recognition dataset in Marathi. We also describe the manual annotation guidelines followed during the process. In the end, we benchmark the dataset on different CNN, LSTM, and Transformer based models like mBERT, XLM-RoBERTa, IndicBERT, MahaBERT, etc. The MahaBERT provides the best performance among all the models. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .
    Prediction of motor insurance claims occurrence as an imbalanced machine learning problem. (arXiv:2204.06109v1 [q-fin.ST])
    The insurance industry, with its large datasets, is a natural place to use big data solutions. However it must be stressed, that significant number of applications for machine learning in insurance industry, like fraud detection or claim prediction, deals with the problem of machine learning on an imbalanced data set. This is due to the fact that frauds or claims are rare events when compared with the entire population of drivers. The problem of imbalanced learning is often hard to overcome. Therefore, the main goal of this work is to present and apply various methods of dealing with an imbalanced dataset in the context of claim occurrence prediction in car insurance. In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance. Our study covers the following techniques: logistic-regression, decision tree, random forest, xgBoost, feed-forward network. The problem is the classification one.
    Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. (arXiv:2204.06214v1 [cs.CV])
    Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.
    Deep Learning for Effective and Efficient Reduction of Large Adaptation Spaces in Self-Adaptive Systems. (arXiv:2204.06254v1 [cs.SE])
    Many software systems today face uncertain operating conditions, such as sudden changes in the availability of resources or unexpected user behavior. Without proper mitigation these uncertainties can jeopardize the system goals. Self-adaptation is a common approach to tackle such uncertainties. When the system goals may be compromised, the self-adaptive system has to select the best adaptation option to reconfigure by analyzing the possible adaptation options, i.e., the adaptation space. Yet, analyzing large adaptation spaces using rigorous methods can be resource- and time-consuming, or even be infeasible. One approach to tackle this problem is by using online machine learning to reduce adaptation spaces. However, existing approaches require domain expertise to perform feature engineering to define the learner, and support online adaptation space reduction only for specific goals. To tackle these limitations, we present 'Deep Learning for Adaptation Space Reduction Plus' -- DLASeR+ in short. DLASeR+ offers an extendable learning framework for online adaptation space reduction that does not require feature engineering, while supporting three common types of adaptation goals: threshold, optimization, and set-point goals. We evaluate DLASeR+ on two instances of an Internet-of-Things application with increasing sizes of adaptation spaces for different combinations of adaptation goals. We compare DLASeR+ with a baseline that applies exhaustive analysis and two state-of-the-art approaches for adaptation space reduction that rely on learning. Results show that DLASeR+ is effective with a negligible effect on the realization of the adaptation goals compared to an exhaustive analysis approach, and supports three common types of adaptation goals beyond the state-of-the-art approaches.
    Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective. (arXiv:2204.06251v1 [cs.LG])
    The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, as with other fields employing DL techniques, there has been a lack of common experimental standards compared to more established disciplines. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in DL into a single, widely-applicable methodology. Following these best practices is crucial to strengthening experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.
  • Open

    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Features of the Earth's seasonal hydroclimate: Characterizations and comparisons across the Koppen-Geiger climates and across continents. (arXiv:2204.06544v1 [stat.AP])
    Detailed feature investigations and comparisons across climates, continents and time series types can progress our understanding and modelling ability of the Earth's hydroclimate and its dynamics. As a step towards these important directions, we here propose and extensively apply a multifaceted and engineering-friendly methodological framework for the thorough characterization of seasonal hydroclimatic dependence, variability and change at the global scale. We apply this framework using over 13 000 quarterly temperature, precipitation and river flow time series. In these time series, the seasonal hydroclimatic behaviour is represented by 3-month means of earth-observed variables. In our analyses, we also adopt the well-established Koppen-Geiger climate classification system and define continental-scale regions with large or medium density of observational stations. In this context, we provide in parallel seasonal hydroclimatic feature summaries and comparisons in terms of autocorrelation, seasonality, temporal variation, entropy, long-range dependence and trends. We find notable differences to characterize the magnitudes of most of these features across the various Koppen-Geiger climate classes, as well as between several continental-scale geographical regions. We, therefore, deem that the consideration of the comparative summaries could be more beneficial in water resources engineering contexts than the also provided global summaries. Lastly, we apply explainable machine learning to compare the investigated features with respect to how informative they are in explaining and predicting either the main Koppen-Geiger climate or the continental-scale region, with the entropy, long-range dependence and trend features being (roughly) found to be less informative than the remaining ones at the seasonal time scale.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.
    The sparse Polynomial Chaos expansion: a fully Bayesian approach with joint priors on the coefficients and global selection of terms. (arXiv:2204.06043v1 [stat.CO])
    Polynomial chaos expansion (PCE) is a versatile tool widely used in uncertainty quantification and machine learning, but its successful application depends strongly on the accuracy and reliability of the resulting PCE-based response surface. High accuracy typically requires high polynomial degrees, demanding many training points especially in high-dimensional problems through the curse of dimensionality. So-called sparse PCE concepts work with a much smaller selection of basis polynomials compared to conventional PCE approaches and can overcome the curse of dimensionality very efficiently, but have to pay specific attention to their strategies of choosing training points. Furthermore, the approximation error resembles an uncertainty that most existing PCE-based methods do not estimate. In this study, we develop and evaluate a fully Bayesian approach to establish the PCE representation via joint shrinkage priors and Markov chain Monte Carlo. The suggested Bayesian PCE model directly aims to solve the two challenges named above: achieving a sparse PCE representation and estimating uncertainty of the PCE itself. The embedded Bayesian regularizing via the joint shrinkage prior allows using higher polynomial degrees for given training points due to its ability to handle underdetermined situations, where the number of considered PCE coefficients could be much larger than the number of available training points. We also explore multiple variable selection methods to construct sparse PCE expansions based on the established Bayesian representations, while globally selecting the most meaningful orthonormal polynomials given the available training data. We demonstrate the advantages of our Bayesian PCE and the corresponding sparsity-inducing methods on several benchmarks.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.
    Approximate Bayesian Computation via Classification. (arXiv:2111.11507v3 [stat.ME] UPDATED)
    Approximate Bayesian Computation (ABC) enables statistical inference in simulator-based models whose likelihoods are difficult to calculate but easy to simulate from. ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data. To obviate the need for summary statistics, we directly compare empirical distributions with a Kullback-Leibler (KL) divergence estimator obtained via contrastive learning. In particular, we blend flexible machine learning classifiers within ABC to automate fake/real data comparisons. We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold. Our theoretical results show that the rate at which our ABC posterior distributions concentrate around the true parameter depends on the estimation error of the classifier. We derive limiting posterior shape results and find that, with a properly scaled exponential kernel, asymptotic normality holds. We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    Utilizing variational autoencoders in the Bayesian inverse problem of photoacoustic tomography. (arXiv:2204.06270v1 [physics.comp-ph])
    There has been an increasing interest in utilizing machine learning methods in inverse problems and imaging. Most of the work has, however, concentrated on image reconstruction problems, and the number of studies regarding the full solution of the inverse problem is limited. In this work, we study a machine learning based approach for the Bayesian inverse problem of photoacoustic tomography. We develop an approach for estimating the posterior distribution in photoacoustic tomography using an approach based on the variational autoencoder. The approach is evaluated with numerical simulations and compared to the solution of the inverse problem using a Bayesian approach.
    Time-uniform central limit theory with applications to anytime-valid causal inference. (arXiv:2103.06476v3 [math.ST] UPDATED)
    This work introduces time-uniform analogues of confidence intervals based on the central limit theorem (CLT). Our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, requiring strong assumptions on the data, while the classical (fixed-time) CLT is ubiquitous due to the weak assumptions it imposes. Our work bridges the gap by introducing time-uniform CSs that only require CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles like the seminal work of Koml\'os, Major, and Tusn\'ady to uniformly approximate the entire sample average process by an implicit Brownian motion. Applying Robbins' normal mixture martingale method to this Brownian motion then yields closed-form time-uniform boundaries. We combine these boundaries with doubly robust estimators to derive nonparametric CSs for the average treatment effect (and other causal estimands). These allow randomized experiments and observational studies to be continuously monitored and adaptively stopped, all while controlling the type-I error.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    Approximating Continuous Functions on Persistence Diagrams Using Template Functions. (arXiv:1902.07190v3 [cs.CG] UPDATED)
    The persistence diagram is an increasingly useful tool from Topological Data Analysis, but its use alongside typical machine learning techniques requires mathematical finesse. The most success to date has come from methods that map persistence diagrams into vector spaces, in a way which maximizes the structure preserved. This process is commonly referred to as featurization. In this paper, we describe a mathematical framework for featurization called \emph{template functions}, and we show that it addresses the problem of approximating continuous functions on compact subsets of the space of persistence diagrams. Specifically, we begin by characterizing relative compactness with respect to the bottleneck distance, and then provide explicit theoretical methods for constructing compact-open dense subsets of continuous functions on persistence diagrams. These dense subsets -- obtained via template functions -- are leveraged for supervised learning tasks with persistence diagrams. Specifically, we test the method for classification and regression algorithms on several examples including shape data and dynamical systems.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    Time series features for supporting hydrometeorological explorations and predictions in ungauged locations using large datasets. (arXiv:2204.06540v1 [stat.ME])
    Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that largely emerge from general-purpose time series features for data science and, more precisely, from a large variety of such features. We focused on 28 features that included (partial) autocorrelation, entropy, temporal variation, seasonality, trend, lumpiness, stability, nonlinearity, linearity, spikiness, curvature and others. We estimated these features for daily temperature, precipitation and streamflow time series from 511 catchments, and then merged them within regionalization contexts with traditional topographic, land cover, soil and geologic attributes. Precipitation and temperature features (e.g., the spectral entropy, seasonality strength and lag-1 autocorrelation of the precipitation time series, and the stability and trend strength of the temperature time series) were found to be useful predictors of many streamflow features. The same applies to traditional attributes, such as the catchment mean elevation. Relationships between predictor and dependent variables were also revealed, while the spectral entropy, the seasonality strength and several autocorrelation features of the streamflow time series were found to be more regionalizable than others.
    SRMD: Sparse Random Mode Decomposition. (arXiv:2204.06108v1 [eess.SP])
    Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the spectrogram leads to a sharp separation between time-frequency clusters which makes it easier to identify intrinsic modes, and thus leads to a new data-driven mode decomposition. The applications include signal representation, outlier removal, and mode decomposition. On the benchmark tests, we show that our approach outperforms other state-of-the-art decomposition methods.
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
  • Open

    How flat is a normal mixture on top?
    Male and female heights both have a standard deviation of about 3 inches, with means of 70 inches and 64 inches. That’s a good first-pass model using round numbers. If you ask what the height of an average adult is, not specifying male or female, you get a mixture of two normal distributions. If we […] How flat is a normal mixture on top? first appeared on John D. Cook.  ( 2 min )
  • Open

    Self-Supervised MultiModal Versatile Networks
    Highlights Introduce the notion of a MultiModal Versatile (MMV) network, that can ingest multiple modalities and outputs common representations useful for downstream tasks; The paper especially studies how best to combine the modalities, so that the common representations respect properties deemed useful by the authors. Introduce the process of deflation, to efficiently apply networks on video data to static images.  ( 3 min )

  • Open

    Best sample text for voice synthesis? [D]
    I'm planning to create a clone of my own voice. Is there some kind of ideal sample text to record? I need 300 sentences. submitted by /u/headwar [link] [comments]
    [R] ML model to generate paths (lines) for a given image
    I am a researcher working on creating paths to indicate the primary and secondary neuronal connections in Corneal Confocal Microscopy images. The ground truth I have is images and sets of two-dimensional lines that indicate the primary and secondary paths as indicated in the image below (Ignore the green dots). The secondary and primary paths are always connected to each other. ​ https://preview.redd.it/r025qcpjddt81.jpg?width=834&format=pjpg&auto=webp&s=11a641988e0c6f8c1e17290fe9588f89ef530635 I am looking to find the most appropriate models to use for this task. The first thing that came to my mind was semantic segmentation. However, I am looking for other approaches that can be more suitable for drawing 1-pixel lines especially since the ground truth paths are indicated as 1-pixel-wide lines (1-pixel thickness) but the connections in the images have wider thicknesses. Any ideas for architecture or methods? submitted by /u/madr3z [link] [comments]  ( 2 min )
    [N] Followup response from BAAI on "A Roadmap for Big Model"
    Source: https://www.baai.ac.cn/portal/article/index/cid/4/id/404.html Statement on the Alleged Plagiarism by “A Roadmap for Big Model” It has come to our attention that the survey report “A Roadmap for Big Model” uploaded on arXiv by a BAAI team is suspected of plagiarism. Immediately upon learning of the allegations, an internal investigation was organized to confirm the issue. BAAI is also initiating an independent review by third-party experts to further asses the issue and accountabilities. As a research institution that attaches great importance to academic standards, BAAI holds a zero-tolerance policy towards academic misconduct. We express our sincerest apologies to the authors of the original papers and to all of those affected. The report in question constitutes a collection …  ( 2 min )
    [P] Image Restoration Using Swin Transformer in JavaScript
    Important note: Right now, the model only supports up sampling from any dimension to at most 256 pixels. I'll likely fix this restriction in the next few days. A few days back, I was searching for AI-based image up sampling models in for use within an offline JavaScript app. The latest approaches, such as SwinIR were unavailable for Javascript, so I just created a notebook that converts the SwinIR model from torch to TFJS in a relatively short kaggle kernel. I believe other transformer architectures can also be ported to JS like this. This is the link to the original paper of SwinIR. It requires around 1 GB of RAM to run. The size of model folder is 44 MB. It is quantized to float16. Anyway, hope someone will find this useful for their website or some other app. submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    [D] Replacing 3x3 convolutions with two 2x2 convolutions
    Something that's always puzzled me is the ubiquitousness of 3x3 convolutions in computer vision. If I recall past discussion accurately, the main benefits of odd-sized kernels are that With the proper padding they maintain the width and height of their inputs, which makes it easier to think about/design neural network architectures. This is not possible with even kernels, unless you swallow the bullet and use asymmetric padding (which is rejected due to aesthetic reasons) Output pixels have a 1-to-1 mapping with input pixels (since odd-sized kernels have a proper "center"). This is considered a nice property -- perhaps (for instance) avoiding aliasing issues during segmentation tasks. Given these two points, we use 3x3 convolutions since they're the smallest odd-sized filter (exclud…  ( 6 min )
    [R] Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection ?
    submitted by /u/MVTS_Ano [link] [comments]  ( 1 min )
    [D] Open problem in modern RL that doesn't need a massive computational resources
    What are open and/or interesting problems in modern reinforcement learning that can be tackled by the average PhD/PostDoc who doesn't have access to a massive compute cluster? The problem shouldn't need us to train our model for 10 months like OpenAI's Dota2 model. Please share your thoughts. submitted by /u/ginger_beer_m [link] [comments]  ( 1 min )
    [D] What are the biggest developments in CV in last 5 years?
    I've been helping a friend of mine learn about CV, but my knowledge starts getting spotty around 2017-2018. In this spirit I'm hoping to discuss the biggest developments in CV in the last 5 years. I know that ViTs have been developed in that time but I'm hoping to fill in my knowledge gaps! A list of topics, important papers, big ideas, or anything else is much appreciated! ​ Edit: Thank you for the discussion everyone 🙌! I've been in and out of meetings but am reading through all responses now submitted by /u/SleekEagle [link] [comments]  ( 3 min )
    [P] How and where do you serve your model? Using kubernetes, docker, metal? Self developed or existing tools?
    Hi, I’m a machine learning platform engineer. I’ve been using, exploring and developing model deployment tools and platform for several years. Very often, I found that many of the tools or managed service of AI platform, are not very welcome by many users. Some think these tools are unnecessarily complicated. I'm currently developing a library in my free time trying to fill the gap. And I also want the library to get well integrated with most users' deployment environments. Would you like to share how and where do you serve your model? Using kubernetes? Self developed or existing tools? Thanks~ P.S. If you are interested, you can visit my project to submit an issue/PR or join the discussions, welcome to help: Pinferencia View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 1 min )
    [P] I created a YouTube Thumbnail Dataset, and need some insight
    Hi guys! I recently created & published a dataset of YouTube video thumbnails on Kaggle (YouTube Thumbnail Dataset), I've tried to make the dataset as diverse as possible, It contains thumbnails from all varieties of YouTube channels. This dataset goes hand in hand with another dataset (containing YouTube video annotations) that I created, namely YouTubers Saying Things. The dataset contains 91 unique YouTube channels, and 10 categories, these categories are assigned by me manually to these channels. (Comedy, Science, Automobile, VideoGames, Food, Entertainment, Informative, Blog, News, Tech) All kinds of feedback and criticism are welcome, and also if you guys want some particular channel to be included in both these datasets, feel free to comment on this post, or raise an issue on the Github repositories for both these datasets, I will surely add them in the next version. Links to the datasets: YouTubers Saying Things Kaggle, Github YouTube Thumbnail Dataset Kaggle, Github submitted by /u/alcatraz2217 [link] [comments]  ( 1 min )
    Improve XGboost classification algorithm with small dataset, based on similar bigger dataset ? [D]
    Hi, I am doing researches about transfer learning for XGboost. I am currently working with a small dataset from a company in Spain (short history) and the scoring is poor. I have worked before with the same company in France and the scoring was great as I had plenty of data thanks to a big history. How could I improve my score with the data from Spain with the help of data from France ? Could I use transfer learning, or data mutualization, or data augmentation ? If anyone has faced before a similar problem, or has read some papers about it, I would love to hear about it. Thank you! submitted by /u/Cutset [link] [comments]  ( 1 min )
    [R] A Modern Self-Referential Weight Matrix That Learns to Modify Itself
    submitted by /u/hardmaru [link] [comments]
    [D] What are some interesting hidden stuff about CNNs?
    Hey all, Im trying to get up to date with Deep Learning literature, so the last week I was going through CNNs. Here's a general view of what Ive learned so far. Large filters suck, you can get better accuracy with smaller filters and more non linearities Depth is the most important for CNNs over width or filter sizes. ReLU activation is generally better, as Sigmoid/tanh gradients tend to fall off towards the ends Convolution layers are only translation invariant. Stacking multiple features together and passing them through MaxPool helps rotational invariance and scaling although not completely Residual connection help address vanishing gradient and help improve the overall training procedure Inception models worked well, as they mixed different filter sizes together helping the model learn diverse features Most current work is with Transformers, although Im not sure why. ConvNext shows similar performance can be achieved through large CNNs Do add to this if I missed anything, or if there's anything you don't know about submitted by /u/Bibbidi_Babbidi_Boo [link] [comments]  ( 5 min )
    [D] How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed
    I’m sure we all remember the story of “The Little Engine That Could.” A little railroad engine was built for pulling a few cars on and off the switches. When more powerful engines are asked to pull a load over a steep hill, they respond “I can’t; that is too much a pull for me”.… Read More »Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed The post Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed appeared first on Data Science Central.  ( 6 min )
    DSC Weekly Digest 4/12/2022: Demographics Drives Analytics
    The Los Angeles Times recently reported on a growing problem not just for California School Districts, but across much of the Northern Hemisphere: The number of children entering school has been dropping steadily for five years now, and is changing the dynamics of education. What’s worse, those declines are accelerating. Sometimes understanding the future comes… Read More »DSC Weekly Digest 4/12/2022: Demographics Drives Analytics The post DSC Weekly Digest 4/12/2022: Demographics Drives Analytics appeared first on Data Science Central.  ( 6 min )
  • Open

    Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation
    You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can reuse features stored in the central repository, avoiding the need to reengineer feature pipelines for different projects and […]  ( 10 min )
    Manage dialog to elicit Amazon Lex slots in Amazon Connect contact flows
    Amazon Lex can add powerful automation to contact center solutions, so you can enable self-service via interactive voice response (IVR) interactions or route calls to the appropriate agent based on caller input. These capabilities can increase customer satisfaction by streamlining the user experience, and improve containment rates in the contact center. In both the self-service […]  ( 6 min )
  • Open

    AI Trippy Dream 35 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Ohio State University Researchers Develop SAT2LoD2: An Open-Source Python Tool For 3D Landscape Modelling Using Satelite Imagery
    3D landscape modeling has seen a rise in its popularity and applications in recent years. It has countless applications in the fields of civil engineering, earth sciences, military applications, and many others. Geometric 3D models are typically developed using the city geography markup language (CityGML), and the Level-of-Detail (LoD) building model is the preferred model for building 3D models using CityGML. The use of Satellite imagery for landscape modeling provides the advantage of covering a wide area and is low cost. However, developing LoD2 models using satellite imagery remains a big challenge. Building models in such a way involves complex steps demanding heuristics-based approaches and ML-based detection paradigms. In a recent paper, researchers at the Ohio State University propose a SAT2LoD2 to facilitate the development of 3D landscape models. SAT2LoD2 is an open-source, python-based GUI-enabled software that takes the satellite images as inputs and returns LoD2 building models as outputs. The software also has the feature of taking road networks and custom maps as additional inputs for better results. Continue Reading Paper: https://arxiv.org/pdf/2204.04139v1.pdf Github: https://github.com/gdaosu/lod2buildingmodel submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there AIs which are able to simulate a human body when you shoot/hit it, that you can use for video games?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Digital Folktales, a collection of short stories about internet folklore, written and illustrated by Artificial Intelligence
    submitted by /u/fabianmosele [link] [comments]  ( 1 min )
    https://youtu.be/0x0to1wNh6s
    https://youtu.be/0x0to1wNh6s A new enterprise project model supported by AI. In the near future, the growing introduction of automation and artificial intelligence will require the updating of most of the activities in the production world, along with changes to contracts, tasks, and integration processes between man and machine. This is supported by the Accenture "IT's Learning" study, according to which 81% of jobs will suffer the impact of AI and robotization. submitted by /u/neologos52 [link] [comments]  ( 1 min )
    How to improve your video editing software with AI?
    submitted by /u/tah_zem [link] [comments]
    Top Ethical Challenges in AI – The Price of Progress
    What does 2022 look like for AI? Let's find out. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Bias in Artificial Intelligence: Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Wrote about KNN — Introduction to DataScience Book
    submitted by /u/mindaslab [link] [comments]
    How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Measuring Goodhart’s Law
    Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult or costly to measure.  ( 4 min )
  • Open

    MIT Schwarzman College of Computing unveils Break Through Tech AI
    New program strives to bridge the talent gap for underrepresented groups in the tech industry.  ( 5 min )
    Engineers enlist AI to help scale up advanced solar cell manufacturing
    Perovskite materials would be superior to silicon in PV cells, but manufacturing such cells at scale is a huge hurdle. Machine learning can help.  ( 7 min )
  • Open

    Simple and Effective Zero-Shot Task-Oriented Dialogue
    Posted by Jeffrey Zhao and Raghav Gupta, Software Engineers, Google Research Modern conversational agents need to integrate with an ever-increasing number of services to perform a wide variety of tasks, from booking flights and finding restaurants, to playing music and telling jokes. Adding this functionality can be difficult — for each new task, one needs to collect new data and retrain the models that power the conversational agent. This is because most task-oriented dialogue (TOD) models are trained on a single task-specific ontology. An ontology is generally represented as a list of possible user intents (e.g., if the user wants to book a flight, if the user wants to play some music, etc.) and possible parameter slots to extract from the conversation (e.g., the date of the flight, the…  ( 8 min )
  • Open

    Hilbert transform and Fourier series
    A few days ago I wrote about the Hilbert transform and gave as an example that the Hilbert transform of sine is cosine. We’ll bootstrap that example to find the Hilbert transform of any periodic function from its Fourier series. The Hilbert transform of a function f(t) is a function fH(x) defined by where the […] Hilbert transform and Fourier series first appeared on John D. Cook.  ( 2 min )
    Logarithms yearning to be free
    I got an evaluation copy of The Best Writing on Mathematics 2021 yesterday. One article jumped out as I was skimming the table of contents: A Zeroth Power Is Often a Logarithm Yearning to Be Free by Sanjoy Mahajan. Great title. There are quite a few theorems involving powers that have an exceptional case that […] Logarithms yearning to be free first appeared on John D. Cook.  ( 2 min )
  • Open

    Is the number of dimensions in the latent space equal to the number of the neurons of the layer? Or perhaps number of neurons in the whole neural network?
    ​ https://preview.redd.it/qtgp7dzx2bt81.png?width=850&format=png&auto=webp&s=ccf391e8a1613d5405c137296bdf853010fc3f19 Not speaking specifically about autoencoders here, but about general neural networks. As I understand correctly, "latent space" refers to one of the fully connected layers of the network and the dimensionality of the space is equal to the number of the neurons in this layer. This would mean, that each of the layers has a different "latent space" representation of the learned data distribution. Do I understand it correctly? I got really confused because people seem to sometimes refer to latent space as to all of the possible activations of all of the neurons in the network (each neuron of the network is one dimension of a latent space) OR EVEN to all of the PARAMETERS of the network (each parameter is one dimension of the latent space (??)). Do we have a separate name for these? How do we call the parameter space of a neural network? Is my original intuition even correct? submitted by /u/bzqp2 [link] [comments]  ( 1 min )
    Researchers Propose a Novel Framework ‘LilNetX’ For Training Deep Neural Network With Extreme Model Compression, and Structured Sparsification
    In this research, the researchers from the paper ‘ LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification’ talk about the importance of larger parameter-heavy and computationally costly architectures in deep neural networks (DNNs) and how it improves the computer vision tasks. They also mentioned in the paper that it is not as simple as it seems since, as the DNNs become more common in the business, they are frequently required to be trained multiple times, communicated across the network to various devices, and executed under hardware limits with minimum loss of accuracy, all while maintaining accuracy. Then the question arises of how to reduce the models’ size on the devices while still enhancing their run-time. Explorations in this field have tended to take one of two paths: lowering model size via compression approaches or reducing computing demands through model pruning. The main achievement of this research from the University of Maryland and Google Research is the introduction of ‘LilNetX’, an end-to-end trainable neural network technique that allows learning models with specified accuracy-rate-computation trade-offs. Prior work has taken a piecemeal approach to these difficulties, which necessitates post-processing or multistage training, which is not efficient and does not scale well for big datasets or architectures. To encourage modest model size, the strategy is to create a joint training goal that penalizes the self-information of network parameters in a reparameterized latent space while simultaneously incorporating priors to increase structured sparsity in the parameter space to decrease computation. Continue Reading Paper: https://arxiv.org/pdf/2204.02965.pdf Github: https://github.com/Sharath-girish/LilNetX submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    noob here who doesn’t really understand calculus
    If i want to take the partial derivative of the error with respect to a certain weight, it would be similar to taking the derivative of say y = value * weight + bias but if i hold the value and bias still, the derivative just becomes the value of the weight, like how the derivative of y = 3x is just 3… so what do I do? it doesn’t make sense to multiply 3 by a learning variable and make that the new weight, so what am I missing? submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
  • Open

    Does the reward in reinforcement learning have to be immediate reward?
    I'm trying to train a seq2seq model that generates a sentence with T words using reinforcement learning. The input and all the previously generated words form the state of the environment, and generating a word in the sentence is considered an action. In the previous methods [1, 2], the immediate reward r(s_t, a_t, s_{t+1}) for the t-th action a_t is 0 when t < T, and the reward is the CIDEr score (a scalar that measures the quality of the sentence) of the entire sentence when t = T. The policy is updated after the entire sentence is generated. I designed a new reward for each action, and the new reward for the t-th action is not zero when t < T. However, the reward of each action can only be calculated when the entire sentence is generated since it relies on the CIDEr score of the entire sentence, i.e. the reward for all the actions relies on the final state s_T. Can I still define the reward in the form of r(s_t, a_t, s_{t+1}) ? [1] Rennie et al. Self-critical sequence training for image captioning, CVPR 2017: 7008-7024. [2] Ranzato et al. Sequence level training with recurrent neural networks, ICLR 2016. submitted by /u/entalent [link] [comments]  ( 1 min )
    Question about math
    I am reading that paper A Distributional Perspective on Reinforcement Learning, and it is related to measure theory. Is it worth to spend time to study whole real analysis and measure theory? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    What does an oscillating explained_variance signify during training? (PPO)
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    SB3- HER+DQN for my simple discrete map env but the training result is pretty bad
    Hi all, I am creating a multiple-goal environment. Which is an 8*8 discrete map with a start and terminal state (only one) change after each episode. The reward is 100 for reaching the terminal state and -1 for the rest. In fact, I am not sure if the reward is reasonable. I used PPO from SB3 and I can easily finish it. But when I go offline, using HER+DQN, the training is very bad. Feel free to run it here or take a look at the env and training result. Thank you so much! https://colab.research.google.com/drive/1Mt5Yje7GTyjOBHL09zC9C1L05xpTAK9v?usp=sharing submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    Number of Feature VS Action Space in Multi-agent Reinforcement Learning
    Hi All, I am working on a MARL fintech project where I use DDQN and for Q-value, I use LSTM bacause it is time series data. This is a project overview. It has 7 features which is derivatives of ask and bid price and has 12 action spaces for action taking. Is it possible to generate a good reliable model using only 7 features for 12 action spaces? Number of feature or quality of feature is important for taking good decision in RL. Open for Suggestion #Reinforcement_Learning #MARL submitted by /u/laxuu [link] [comments]  ( 1 min )
    Is there anyone interested in re-implementing APT?
    Hi, these day I really interested in self-supervised RL. Especially only based on state novelty. So I wanted to re-implement APT(Behavior From the Void: Unsupervised Active Pre-Training). but my re-implementation showed not meaningful behaviors compared to official implementation. official implementation uses drq-v2 and intrinsic curiosity module. So, I want to re-implement APT as described in paper(using drq-v1 and contrastive learning). Is there someone to check my reimplementation? https://github.com/seolhokim/apt In that repository, DrQ-v1 works well, but only apt doesn't work! I can't understand why agent stop moving when pre-training. ​ Really thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    The Voice Synthesis Business: 2022 Update
    In the past few years, high-quality automated text-to-speech synthesis has effectively become a commodity, with easy access to cloud-based… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 15 min )
  • Open

    Massaging Data using Pandas
    When we talk about managing data, it is quite inevitable to see data presented in tables. With column header, and […] The post Massaging Data using Pandas appeared first on Machine Learning Mastery.  ( 25 min )
  • Open

    MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets
    In deep learning and machine learning, having a large enough dataset is key to training a system and getting it to produce results. So what does a ML researcher do when there just isn’t enough publicly accessible data? Enter the MLCommons Association, a global engineering consortium with the aim of making ML better for everyone. Read article > The post MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock
    Episode 135 | April 13, 2022 In “Just Tech: Centering Community-Driven Innovation at the Margins,” Senior Principal Researcher Mary L. Gray explores how technology and community intertwine and the role technology can play in supporting community-driven innovation and community-based organizations. Dr. Gray and her team are working to bring computer science, engineering, social science, and […] The post Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock appeared first on Microsoft Research.  ( 31 min )

  • Open

    How IoT Uses Machine Learning To Change The World
    IoT and Machine Learning are the most advanced and evolving technologies that continue to rise in today’s modern world, simplifying human efforts and making lives easier. These technologies have proved to streamline operations and workflows for various industries and provide more robust and scalable applications that allow users to make things done seamlessly.  In recent… Read More »How IoT Uses Machine Learning To Change The World The post How IoT Uses Machine Learning To Change The World appeared first on Data Science Central.  ( 6 min )
    Drag-and-drop Data Pipelining: The Next Disruptor in ML
    Recent advances in machine learning (ML) and artificial intelligence (AI) technologies are helping enterprises across industries quickly move from their use cases from the pilot stage to production and operationalization. According to a report by McKinsey & Company, by 2030, businesses that fully absorb AI could double their cash flow, while companies that don’t could… Read More »Drag-and-drop Data Pipelining: The Next Disruptor in ML The post Drag-and-drop Data Pipelining: The Next Disruptor in ML appeared first on Data Science Central.  ( 3 min )
    Advances Highlight the Future of IoT Security
    As the Internet of Things (IoT) is gradually moving from being a centralized structure to a more complex network of innumerable decentralized smart devices, the need for security of data will be acknowledged to a greater degree, thereby promoting the expansion of the global IoT security market. The larger the volume of the data transferred… Read More »Advances Highlight the Future of IoT Security The post Advances Highlight the Future of IoT Security appeared first on Data Science Central.  ( 3 min )
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    The Moment A Neural Net Became Sentient For The First Time - AI Art Story [4K] #shorts
    submitted by /u/fooo-ooo [link] [comments]
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    Should I use A Encoder Decoder CNN
    I'm trying to make a model to play a car racing simulator. I have a dataset with the inputs used(human) to get fast lap times. I would like to make a model that reads the game video output and predicts the arrow key inputs to get a fast lap time. It seems, to me, that a CNN with encoder-decoder layers trained on the keyboard inputs would work. Is this a good architecture? I'm also having a hard time finding useful literature. please let me know if there is anything I should look into or do differently. submitted by /u/newroadkill [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science, AI and Machine Learning in 2022
    submitted by /u/saik2363 [link] [comments]  ( 1 min )
    Last Week in AI: OpenAI DALL-E 2 generates amazing images, Google's 540 billion parameters language model, Clearview AI branches out beyond police, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science in 2022
    submitted by /u/saik2363 [link] [comments]
    Conversation about the future, life and AGI
    submitted by /u/HumanSeeing [link] [comments]  ( 1 min )
    AI predicts if and when someone will experience cardiac arrest
    submitted by /u/qptbook [link] [comments]
    The last Woolly Mammoth on Earth
    submitted by /u/Ok-Passion-6574 [link] [comments]
    The last Woolly Mammoth on Earth
    Is it good or bad? Also, I was wondering what art goes big as an NFT? submitted by /u/Ok-Passion-6574 [link] [comments]
    Stanford Researchers Introduced a Novel Deep Learning Computer-Assisted System for Real-Time Open Surgery and AVOS (the Annotation Videos of Open Surgery) Dataset
    In recent years, the rise of Deep Learning has continuously brought innovations to many fields, and the medical domain is one of them. AI applications in this field are countless: from pre-operative diagnosis to disease classification, from skill assessment to post-operative rehabilitation. Among them, systems to assess surgical skills and provide feedback to improve technique could help in decreasing the number of complications in surgical procedures, which are still the third leading cause of death globally. AI can be an additional coach for surgical trainees and an expert colleague for experienced surgeons. But, to train an AI system, reliable data are fundamental. The more utilized type of data in this context is undoubtedly video streams, as a camera is less invasive than other types of sensors, such as ArmBand or EEG, which could weigh on the surgeon’s performance given their physical bulk. This applies particularly to laparoscopic surgery, where an in-body fiber-optic camera is used to visualize the operating area and facilitate rapid data collection. For this reason, the majority of computer-assisted systems focus on laparoscopic surgery. Continue Reading Paper: https://arxiv.org/pdf/2112.07219.pdf https://preview.redd.it/84mwdw81l4t81.png?width=741&format=png&auto=webp&s=636aff067560876d14f37caccb83bd951e991c68 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Machine Learning vs. Cookie Consent Systems
    submitted by /u/DaveBowman1975 [link] [comments]
    Artificial Nightmares: Stone Golem Ruins || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    My epiphany on synthetic media five years later, and what I feel is coming within the next five years
    Roughly five years ago, I created this thread where I outlined my realization about the imminency of synthetic media. This was before transformers blew up, before StyleGAN, before GPT-2, when WaveNet and DeepDream were still among the best we could do, and when predictive text algorithms that were barely better than Markov Chains were still the state of the art. In five short years, the state of artificial intelligence has changed overwhelmingly, to the point it's barely recognizable. Looking back to 2017, I now get this sense of everything feeling so primitive and fake. I've stated many times that AI before roughly 2019 was a bunch of digital magic tricks, and the field as a whole was essentially a giant Potemkin village that utilized clever sleight of hand and advertising to make it se…  ( 7 min )
  • Open

    [N] Substantial plagiarism in BAAI’s “a Road Map for Big Models”
    BAAI recently released a two hundred page position paper about large transformer models which contains sections that are plagiarized by over a dozen other papers. In a massive fit of irony, this was found by Nicholas Carlini, a research who (among other things) is famous for studying how language models copy outputs from their training data. Read the blog post here submitted by /u/StellaAthena [link] [comments]  ( 2 min )
    [D] What's the status of live speech-to-speech conversions? (not TTS)
    I've been trying to find information about the subject, but almost every result is TTS, and the only example of what I actually want (Respeecher) costs 2 grand a year. Are there any (preferably open-source) other alternatives? submitted by /u/UncertainOutcome [link] [comments]  ( 1 min )
    Comparison of workshops at major conferences. [D]
    I understand that workshop quality is dependent more on the workshop itself rather than the host conference. But, in general, how are workshops from CVPR, NeurIPS, ICLR, ICML, etc. viewed by the community in relation to one another? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [Project] Leniax - A Lenia simulation library powered by JAX
    Hi everyone! I'm really happy to finally publish the work I've been doing on the Cellular Automata called Lenia. It is a JAX library called Leniax and allows one to simulate thousands of simulations in parallel using CPU, GPU, or TPU. With it, you can: Simulate Conway's Game of Life Simulate multiple Lenia simulations in parallel Use gradient descent to search for Continuous CA parameters Launch a QD search to discover a ton of diversity in Lenia. Check out the blog post for some visual results The main goal of this work was to advance the state of automatic discovery for those systems. 10 months ago, I bet on QD to do so, turns out it indeed works! QD algorithms really rock! The code is completely open-source with all the examples, notebooks, and even experiments I ran. (See the doc for more links) I would love to have feedback on this and of course, if you find that subject interesting, engage with our community! Cheers! submitted by /u/morgangiraud [link] [comments]  ( 2 min )
    [D] Effective Image Pre-Processing Techniques for Enhancing Defects in an Image?
    So I am doing some object detection on Pavement Defects. I've already collected the data with some annotations but the model is performing rather poorly. For example, the `maP` is about `0.12` for the whole data. By examining the data, I think one of the reasons is that some of the defects such as cracks, or faded pavement markings are not so clear and either casted by a shadow or too bright from the sun. Image Example #1 Or from motion blur Image Example #2 Is there any image preprocessing technique aside maybe from CLAHE that could be applied? Moreover, I am currently using YOLOv5 for this. submitted by /u/sarmientoj24 [link] [comments]  ( 3 min )
    [P] the copent package v0.2.3 available on PyPI now
    The copent package implements the method for estimating copula entropy (mutual information) and transfer entropy (conditional mutual information / conditional independence). This version add a new feature (an argument 'mode') for dealing with large data when memory is limited. Github: https://github.com/majianthu/pycopent PyPI: https://pypi.org/project/copent/ any comments are welcome. submitted by /u/majianthu [link] [comments]  ( 1 min )
    [D] Feedback on a worked Continuous Deployment Example (CI/CD/CT)
    Hey everyone! At ZenML, we released today an integration that allows users to train and deploy models from pipelines in a simple way. I wanted to ask the community here whether the example we showcased makes sense in a real-world setting: Context ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. Seldon Core is a production grade open source model serving platform. It packs a wide range of features built around deploying models to REST/GRPC microservices that include monitoring and logging, model explainers, outlier detectors and various continuous deployment strategies such…  ( 2 min )
    [D] Can we decrease the training time of a deep learning model by using a domain specific pretrained backbone instead of the standard imagenet?
    I am working in the retail domain atm, and train a lot of image classifiers. I have always used imagenet as pretrained to train my model upon. I thought it would be straightforward to train a backbone on a big retail dataset(1000+ classes), and then use that as pretrained and it'll reduce the time it takes for my models to generalize. Turns out, the model took more epochs to train when using the retail backbone, then the imagenet one. Isn't this counter-intuitive? What else can I do to make by backbone better? submitted by /u/lMAObigZEDONG [link] [comments]  ( 1 min )
    [D] Removing Unpredictable Samples from a Training Set
    Hi, I have a fairly interesting project that I am working on. I have a model that has some samples which are completely unpredictable, random noise, and some that are reliably predictable. How would you go about separating out the samples which can be predicted, identifying them going forward, and retraining on a cleaned dataset with only those samples? Interested to see someone else's approach to this. Edit: I forgot to mention that my data is from an embedding matrix from ordinal categorical features. submitted by /u/Katapilla_Killa [link] [comments]  ( 2 min )
    [D] What's your experience with Model-Agnostic Meta-Learning in RL?
    There is the original paper and there was a subsequent paper by other authors titled: On the Convergence Theory of Debiased Model-Agnostic Meta_Reinforcement Learning I've been working on implementing the latter paper on the HalfCheetah environment. However, my attempts have been unsuccessful so far (I know the authors provided the code, but I am trying to write my own code to check my understanding). I'd like to know any tips/tricks that anyone can share and just to know about people's experiences, especially using MAML for RL. submitted by /u/carlml [link] [comments]  ( 1 min )
    How to deal with the fact that whatever idea I have has already been published.
    I'm always having these ideas for projects and papers, then I look around a bit, and I find someone that has already studied that idea and published it. It's genuinely annoying, It's been 6 months now, and all the papers are newly published (2021 mostly) so It's even more annoying. How do you deal with that ? and How do you find a niche that no one is touching. I just started a PhD, so It's really stressing me out. I feel like I'll never be able to advance on my thesis, and that I should just quit, because better work has already been done. submitted by /u/AlanRoofies [link] [comments]  ( 4 min )
    [P] Faster version of cv2.BFMatcher(cv2.NORM_L2) optimized for keypoints matching
    Hi there, in the case if any of you use the openCV BFMatcher with NORM_L2, you can try to use my recent pet project: https://github.com/kmkolasinski/fast-bfmatcher Basically the speed-up is achieved by using faster replacement for BLAS, a BLIS library and some custom implementations written in C and cython. submitted by /u/kmkolasinski [link] [comments]  ( 1 min )
    [D] Are there any comparison studies on learning rate schedules for generative transformers?
    My current research heavily involves generative vision transformers and after some experimentation it seems like the choice of a LR scheduler is a crucial factor for proper convergence. Does anyone know of any comparison studies done recently that explore various types of schedulers for generative tasks? submitted by /u/Megixist [link] [comments]  ( 4 min )
    [N] Fine-Tuning LayoutLM v2 For Invoice Recognition
    With the advent of deep learning models, automated data extraction is becoming more accessible. In this article, we demonstrate step-by-step how to fine-tune layoutLM V2 on invoices starting from data annotation to model training and inference. Enjoy the read and if you have any questions, leave them below. submitted by /u/UBIAI [link] [comments]  ( 1 min )
  • Open

    Lidar-Camera Deep Fusion for Multi-Modal 3D Detection
    Posted by Yingwei Li, Student Researcher, Google Cloud and Adams Wei Yu, Research Scientist, Google Research, Brain Team LiDAR and visual cameras are two types of complementary sensors used for 3D object detection in autonomous vehicles and robots. LiDAR, which is a remote sensing technique that uses light in the form of a pulsed laser to measure ranges, provides low-resolution shape and depth information, while cameras provide high-resolution shape and texture information. While the features captured by LiDAR and cameras should be merged together to provide optimal 3D object detection, it turns out that most state-of-the-art 3D object detectors use LiDAR as the only input. The main reason is that to develop robust 3D object detection models, most methods need to augment and transform th…  ( 8 min )
  • Open

    Very Deep Neural Networks Explained in 40 Seconds
    By Vincent Granville, Ph.D., Author at MLtechniques.com Sponsored Post Very deep neural networks (VDNN) illustrated with data animation: a 40 second […] The post Very Deep Neural Networks Explained in 40 Seconds appeared first on Machine Learning Mastery.  ( 4 min )
    Scientific Functions in NumPy and SciPy
    Python is a general-purpose computation language, but it is very welcomed in scientific computing. It can replace R and Matlab […] The post Scientific Functions in NumPy and SciPy appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    What can you tell us about him?
    (A Sci-Fi Ultrashort)  ( 5 min )
  • Open

    MIT’s FutureMakers programs help kids get their minds around — and hands on — AI
    The programs are designed to foster an understanding of how artificial intelligence technologies work, including their social implications.  ( 8 min )
  • Open

    Best GridWorld environment?
    In your opinion, what is the best gridworld environment? I want to compare different RL algorithms on it. I’m looking for something super basic: - start and goal state - some obstacles - customisable: move the start and goal state, place obstacles in different points, modify reward map etc. - computationally efficient Thank you submitted by /u/wiston_smith [link] [comments]  ( 1 min )
    Custom Callback for Max Episode Reward using Stable Baselines3 with Custom Env
    Hi all, I've built a custom gym env and am using Stable Baselines3 to train an agent. I would like to visualise in TensorBoard the maximum reward achieved for each episode. I have these values in a list in my env, and I am trying to create a custom Callback to plot this in TensorBoard but it's not working. I've looked over the documentation and other forums but can't figure out how to do this. Can anyone help me out? 🙏🏽 Thank you! submitted by /u/leozinho2r [link] [comments]  ( 1 min )
    Open-sourced NetHack 2021 NeurIPS Challenge winning agent
    Recently, we have released the source code of our winning solution for the NetHack 2021 NeurIPS Challenge: https://github.com/maciej-sypetkowski/autoascend We hope that it will help in leveraging this complex environment, that still seems to be beyond capabilities of reinforcement learning. Check out links in the README "Description" section for more context. submitted by /u/procedural_only [link] [comments]  ( 1 min )
    Project
    Do any of u have any good rl project suggestion or a complete project for college major , I have only done work on some self playing Atari , mario games , if u have any good idea please suggest 🙌 submitted by /u/stoned_egineer [link] [comments]  ( 1 min )
    Training a DQN agent for platformer game
    Does anyone have experience training agents to play platformer games like mario? I am trying to train an agent for the platformer game Jump King to get past atleast a few levels, using DQN but the agent is performing poorly after 8000 episodes of training, (one episode being the agent spawns at the start and has 15 seconds or so to jump around gaining reward) he is barely able to get past the first level most of the time :c I am using a very basic sequential network of 2 Linear layers with inputDim 4, outputDim 4, and hiddenDim 32. and because my state is not using any image data, its just (current_level, x_pos, y_pos, jumpCount) as input to the network . As for the reward, I am using the y position to give reward if the agent is getting to a new level (large reward) or making progress in the current level (curr_y > old_y), otherwise the agent gets a negative reward. Should I consider using a CNN and image data to train this agent like in the atari games paper, or is using image data and a conv net going to perform worse rather than using my current state? Should I consider combining image data with the current state, or just keeping the current non-image data state but ? Also, roughly how long should I be training the agent for? is 8000 episodes not enough? 1 episode takes roughly 7 seconds in time (it is using pygame engine and I turned off the rendering and I think that made it a little bit faster) This is my first time training an agent for a hard game like this using DQN so I would appreciate any tips/advice to improve the agent! repo: https://github.com/senweim/JumpKingAtHome submitted by /u/TernaryJimbo [link] [comments]  ( 2 min )
    Which environment impress you? (related with software architecture, API, ...)
    I want to hear about your impressive environment! Specifically, I want to make my custom environment well using various library like openAI gym. In this contxet, I find out the highway-env https://github.com/eleurent/highway-env/ ! I think this environment has convinient API for users. Thus, I make my custom env with referencing the highway-env https://preview.redd.it/ndl1h1dvpzs81.png?width=711&format=png&auto=webp&s=267320a9713d98e7e49c4bb89423e5a9612bad8e In this line, could you speak your best environment? It doesn't matter about your best env has any advantage! submitted by /u/Seungeon94 [link] [comments]  ( 1 min )

  • Open

    Strategies to deal with Large Action Spaces
    Hey guys, I tried building a PPO model for Wordle. My initial test was checking the performance of the model with just 100 words. The agent was able to learn within a few thousand epochs and had an average guess length of about 2.8 before it could correctly identify the words. However, when i extend the action space to the entire 2.3k words, the model barely learns. Even after a few 100k iterations, the mean length revolves around 5.9 (given wordle has a max of 6 attempts per game) Any suggestions on how to help the agent learn faster in large action spaces? ​ I also tried an embedding based approach, but the performance was very similar. ​ Thanks submitted by /u/altair9335 [link] [comments]  ( 1 min )
    What are your best results for ProcGen: CoinRun?
    Has anyone managed to get a consistent score of > 9 on CoinRun? I understand that some generated levels require LSTMs in order to be solvable 100% of the time, but even excluding these hard-core levels I can see some occasions where my agents are not operating with 100% effectiveness. For some reason "fully solving" CoinRun seems harder than expected. The papers on CoinRun usually just show the results after 100mm steps or so, but I am more interested in what the community has achieved with "normal setups". submitted by /u/tmuxed [link] [comments]  ( 1 min )
    How to use the same action in trained RL network, when model is retested?
    I trained RL agent using stable baseline library and gym env. When I am trying to test agent, this makes different action when I am re running again. I used the same seed in test env. for i in range(length-lags-1): action, _states = model.predict(obs_test) obs_test, rewards, dones, info = env_test When I am runnig again the above code, I am getting the different results. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    Unity RL ml agents module, walker example
    Hi all, I'm trying to teach my custom fbx model to walk with the help of ppo, as in the example from ml agents. I have difficulties with the exact import and the assignment of rigidbody here, that is, the neural network is being trained, but for some reason physics does not work. Has anyone seen it, or does anyone have an example of how to train a unity custom fbx model using ml agents? Thx all! submitted by /u/IndependenceCivil576 [link] [comments]  ( 1 min )
    Implementing RL algorithm on apache spark
    I want to run RL algorithm on Apache Spark. However, RL does not exists in Spark's MLib. Is it possible to implement it? any links may help. Thank you in advance submitted by /u/fatenLouati [link] [comments]  ( 1 min )
    Is reinforcement learning being used for the development of self-driving cars
    We will introduce the general process of self-driving tasks first and then the development of Reinforcement Learning in self-driving cars. The general process of self-driving tasks includes perceiving, decision-making, planning and controlling. The tasks of perceiving have adopted deep learning and that did a good job. Being different from monitoring learning, decision intelligence AI methods, which are represented by reinforcement learning, model the environment as Markov Decision Process(MDP)to get optimization. In sequential decision problems the utility of agent's actions do not depend on single decisions, expressed with the state, which the agent would have gotten into, as the result of this decision, but rather on the whole sequence of agent's action. Here, one thing that needs t…  ( 4 min )
  • Open

    [R] Transformers replicate Hippocampal representations; notably place and grid cells in the brain
    Paper: https://arxiv.org/abs/2112.04035 Yes, the paper is cautious about comparing the model one-to-one to the brain “Note, we are not saying the brain is closely related to transformers because it learns the same neural representations, instead we are saying the relationship is close because we have shown a mathematical relationship between transformers and carefully formulated neuroscience models of the hippocampal formation.” While objections like "its just correlation/relation, its not exactly the same!!" are true to an extend, its still a very unexpected observation that, they're even remotely similar. Needless to say, Transformers were not inspired from the brain - and as more evidence collates (https://www.nature.com/articles/s42003-022-03036-1 --> Activations are linearly correlatable) it does feel mysterious; perhaps atleast some of the systems used by the brain converge on an efficient pattern discovered by our backpropogated friends... [insert 'coincidence? I think not!' meme] submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [D] What is the smallest, most capable, generative language model available now?
    I'm looking for a generative-LM equivalent of an EfficientNet-Lite, for inference on devices with limited to no VRAM. I know about some popular ones like DistilGPT2. But it's been 2 years after its release. Surely, someone improved their size/performance ratio, right... right? Thank you for your time. 🤗 submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    How would you rank major tech companies' research labs for prestige? [D]
    This is just for fun, not to be taken too seriously. But I'm curious what are the reputations among the community for various research divisions (specifically AIML) of major companies, ie: Google, Facebook/Meta, Microsoft, Amazon, NVIDIA, IBM, etc. My perceived (albeit naive) view is Google > Facebook > MSR are top tier. Don't know much about the others. But I've read that some people consider MSR most prestigious due to their academic environment. But I've seen that Google and FB dominate in terms of major publications, ie: vision transformers are associated with Google. submitted by /u/avd4292 [link] [comments]  ( 2 min )
    [P] Squirrel: A new OS library for fast & flexible large-scale data loading
    Hi all, Today we open-sourced Squirrel, a data infrastructure library that my colleagues and I have been working on over the past 1.5 years: https://github.com/merantix-momentum/squirrel-core We’re a team of ~30 ML engineers developing machine learning solutions for industry and research. Across all our projects, we need to load large-scale data in a fast and cost-efficient way, while keeping the flexibility to work with any possible dataset, loaded from local storage, remote data buckets or via APIs such as HuggingFace. Not finding what we were looking for, we decided to build it ourselves. Squirrel has already proven its value in our deep learning projects at Merantix Momentum and shows competitive benchmark results (check them out here). We’re super excited to share it with the OSS community and hope that you can benefit from it as well! Looking forward to hearing your feedback and questions :) submitted by /u/Nextpenade [link] [comments]  ( 1 min )
    [P] Renting lots of GPUs (100-200) in single environment?
    I want to apply an already trained ML model on a huge textual data set. I have funds to rent Cloud GPUs, but have not much experience using them. Preferably, I do the setting up of the environment only once (downloading of the data, model, software packages, etc) only once and then simply send ~100-200 scripts each to their own GPU for processing. Then at the end everything is in the same location and I can easily send back the final result file (~100-200 output files concatenated together) back to my PC. Any advice on how to do that? All GPU renting servers only have 1-8 GPUs per server and do not (seem) to allow for sharing of the environment, which seems very inefficient to me. All comments are appreciated. submitted by /u/Intelligent-End2673 [link] [comments]  ( 2 min )
    [R][P] Algorithmic stability of minibatch SGD
    Hi, was wondering if anyone else has looked into "A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent" and in particular eqn. 2, which is the derivation of Section 3.5 of "Train faster, generalize better: Stability of stochastic gradient descent" adapted for the case where the underlying loss we are interested in guaranteeing generalisation for is upper bounded by M (rather than 1 as assumed by Hardt et al). In the case of minibatch SGD, the number of datapoints n becomes the number of minibatches, as ideally one would like to reduce the number of steps T by maximizing the learning rate, which requires maximizing the minibatch size for the loss to actually converge to 0 on the training data. However, what I am unsure about is, specifically for the classification task where we typically minimize the cross-entropy objective, whether the cross-entropy objective is an upper bound on any kind of M-bounded loss function. In the ideal world, I would like to show that it upper bounds the 0-1 loss which means the cross-entropy over the dataset is an upper bound on the classification accuracy and any generalization statement automatically becomes a statement about the very practical metric of accuracy. Such a statement about cross-entropy upper-bounding 0-1 is made in Section 3C of "Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization". However, one can provide a counterexample in the limit of the softmax "temperature" parameter where the predicted class distribution becomes uniform, in the case of 2 classes, for the typical case of log being the natural logarithm (it is no longer a counter-example if log base 2 is used). I haven't been able to show or find proof that this statement "xent >= 0-1" is true (for some logarithm base and some number of classes) and was hoping that someone might have. submitted by /u/wakeupandshave [link] [comments]  ( 1 min )
    [R] Channel Augmented Joint Learning for Visible-Infrared Recognition
    Since going open source in March 2020, MindSpore gone from strength to strength. The deep learning framework has been downloaded by over 1.2 million users; algorithms running on MindSpore have been published in AI journals or presented at conferences; and countless developments have been released in device-edge-cloud scenarios to transform business fields, such as intelligent manufacturing, cloud, wireless, data communication, energy, and consumer business. Built on extensive experience from the scientific, academic, and industrial sectors, MindSpore-based AI papers accounted for 11% of all AI papers in October 2021, ranking No.2 worldwide by month, and No.3 worldwide in Q4 2021. In this blog post, based on a paper published in ICCV 2021 by Professor Mang Ye of Wuhan University, we intro…  ( 3 min )
    [R] Looking for Ideas in Pre-training a RL Agent
    Hi all, I've been working on reinforcement learning lately, but wanted to come to the general ML subreddit to seek inspiration from other disciplines. I've been working on strategies to decrease the training time for my real-world inverted pendulum experiment. Specifically, I am trying to pre-train the Q network in a simulation before deploying. The strategy that I have found most successful right now is this: start with randomly generated weights REPEAT OF AN EPOCH: - Load new_weights to Q Network - initialize an environment with randomly generated parameters (i.e. random mass, lengths, etc). - Train agent on environment for 100 episodes - Save new_weights I have tried a variety of strategies to add a little bit more control over this process. I've tried a soft update that never showed improvement. W = old_weights * (1 - alpha) + new_weights * alpha I have tried an additive update which was slightly successful. Measured the success of each network as the sum of rewards over the epoch. A = (old_R)/(old_R+new_R) ; B = (new_R)/(old_R+new_R) W = old_weights * A + new_weights * B But none of these work as well as just using the most recent weights. I've included some results if anyone's interested. The first graph is three test trials with random initial weights, the second graph is with pre-trained weights. This is a pretty hand-wavy way of doing this, does anyone have any suggestions to do this better? ​ https://preview.redd.it/9mnewmibbts81.png?width=375&format=png&auto=webp&s=a6ff866a31987375d276d12f69dbe2af40380bf4 https://preview.redd.it/096n7ctcbts81.png?width=375&format=png&auto=webp&s=b19d800fbb416fca288e86640d9458c8993e0759 ​ submitted by /u/nickthorpie [link] [comments]  ( 2 min )
    [P] Recommendations for high frequency multivariate time series data
    Hey there! I'm looking for advice on datasets to use for a project. We are looking for the following traits: ​ 1) Multivariate (at least 3 or 4, and probably no more than 50 or 100 as an upper bound). 2) High frequency (Ideally at least once every 5-10 minutes) 3) We need to have some notion of an underlying 'state' of the data for certain windows. E.g. in an energy setting, period X was the 'family at home using appliances' state. Or in the healthcare setting, period X is 'the patient is in a stable state' while period Y is something like 'the patient experiences a cardiac event' ​ ​ Nice to have: 4) It'd be great if some features had some level of seasonality while others didn't. ​ ​ Do folks have any recommendations for datasets that meet some (or hopefully all) of the criteria? I did some light pursuing on UCI, but it seems like much of it is not high frequency enough, and/or doesn't have some notion of underlying states. submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [D] What to do when the authors don't release source code?
    Hello, I am currently working on a research paper that I aim to publish at a reputable conference shortly. In our work, we borrow a feature engineering technique from one of the papers that have not been previously applied to the domain (time series AD) before that paper. However, the authors haven't released the source code of their implementation of the model (but the feature engineering technique is publicly available). I feel like that is an important baseline and just failing to include it would get my paper rejected. I have contacted all the authors for the source code, but none of them responded. The architecture they use is a fairly complicated one and would be very difficult to implement on my own. How do I go about this situation? My advisor told me I can just include a few points on the footnote on why we don't include this as a baseline. Those being: No open-source implementation Contacted the authors, didn't receive a response The paper has not been published, only uploaded to arxiv. Any help is appreciated! submitted by /u/mythrowaway0852 [link] [comments]  ( 5 min )
  • Open

    AI when given the prompt of “Amy Schumer” on wombo.art
    submitted by /u/9YearOldGeneralOfPew [link] [comments]  ( 1 min )
    AI News: New Robot Fingertips Can Feel | AI Tracking Satellite | SingularityDAO DynaSets | Tesla Optimus Specs
    submitted by /u/getrich_or_diemining [link] [comments]
    a quick high-level overview of diffusion models (like dall-e 2)
    submitted by /u/individual_kex [link] [comments]
    How can I train an AI to write articles based on my own work?
    Hi all! As a sort of art experiment, I want to train an AI to write tech news articles based on my own work. I worked as a freelance writer for several years and have thousands of articles (each as a Word doc) on tech news. I want to use those articles to train the AI, then have it generate new articles to post to a blog. I have a pretty good understanding of machine learning, but have never trained a model myself. I'm hoping you all can provide some direction. Some specific questions: Can you recommend a model? For each training article, can I provide a "source" (like another news article) so the AI understands where the content in the training article came from? * For each generated article, can I provide a news article source for it to base its content on? ** Can I use the Word docs as the training set, or do I need to convert them into something else for training? *as an example: If I wrote an article on the release of a new Raspberry Pi board, my source might be the press release on the Raspberry Pi website. **as an example: If I want it to generate an article about a new drone delivery service, my input source might be a news article on Reuters or something. submitted by /u/TheSerialHobbyist [link] [comments]  ( 1 min )
    are there any open source video ads generation model out there?
    Hey is there any models to generate videos for advertisment either as text-to-video images-to-video or video-variation creation, if not would video variation generative models would be a good fit for create ads ?? submitted by /u/National-Departure78 [link] [comments]
    DALL-E 2, the future of AI research, and OpenAI’s business model
    submitted by /u/bendee983 [link] [comments]
    how do i learn artificial intelligence from the basics?
    Is there any resources which has example driven explanations, from scratch or basics? I have seen some websites just jumping into "use this module/library" Without explaining what it does or how it works, just some basic examples so that i can build on top or experiment by my own. submitted by /u/-1Mbps [link] [comments]  ( 1 min )
    Using the NEAT algorithm to teach elves to deliver presents
    submitted by /u/zuparnowa [link] [comments]
    Trippy AI Dream 16 - Gothic Style Jungle Fever - VQGAN CliP Rife-Rea...
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 23 - Flower Power² VQGAN CliP Rife-RealESRGAN
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 32 - WE REACHED 100 SUBSCRIBERS !!
    submitted by /u/LordPewPew777 [link] [comments]
    MindSpore has implemented a visible-infrared recognition algorithm
    submitted by /u/Creative_Habit_6868 [link] [comments]
    I want to learn AI From beginning ? from where can i start?
    submitted by /u/Late_Illustrator_545 [link] [comments]  ( 1 min )
    Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance in Object Detection
    Object detection is a crucial problem in computer vision, and YOLO (You Only Look Once) one-stage object detectors have set the bar for performance since the release of YOLOv1 in 2015. The YOLO series has undergone considerable network and structural improvements over the years. The most recent version, YOLOX, has attained an optimal balance of speed and accuracy on the NVIDIA Tesla V100 Tensor Core GPU. Baidu researchers have improved their earlier PP-YOLOv2 model, resulting in PP-YOLOE, a cutting-edge industrial object detector that beats YOLOv5 and YOLOX in speed and accuracy trade-off. The team’s PP-YOLOE-l variant outperforms PP-YOLOv2 by 1.9 percent AP and YOLOX-l by 1.3 percent AP on COCO datasets. The PP-YOLOv2 baseline model architecture comprises a ResNet50-vd backbone with deformable convolution, a PAN neck with an SPP layer and DropBlock, and a lightweight IoU aware head. PP-YOLOv2 assigns only one anchor box to each ground truth object, similar to YOLOv3. It is strongly reliant on hand-crafted design, which may not generalize well enough when trained on other datasets. Conversely, this technique necessitates a lot of additional hyperparameters. To overcome this problem, Baidu researchers have added an anchor-free technique to PP-YOLOv2 that tiles one anchor point on each pixel and assigns upper and lower bounds for detecting heads to assign ground facts to a matching feature map. The center of a bounding box can then be determined to choose positive samples from the closest pixels. A 4D vector is also predicted for regression, with minor model speedups and precision losses due to the changes. Continue Reading Paper: https://arxiv.org/pdf/2203.16250.pdf Github: https://github.com/PaddlePaddle/PaddleDetection submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Song writing Ai
    Hi all, I’m hoping someone could point me in The direction of an AI that I could dump all my previous song writing into that would spit out something "inspired by’ it. Mostly a bit of fun but interested in seeing what it throws back out at me. thanks in advance for any hot tips. submitted by /u/doccaballero [link] [comments]  ( 1 min )
  • Open

    How to train the NN model with a custom dataset?
    Hi all. I have been trying to work on an object detection project. Basically trying to play around with the codes in the documentation for a custom dataset. I am using Yolov3 and I trained my model using darknet and seems like the model is learning it wrong because of which the weights are not correct either. I don't know how to check that but when doing forward propagation, the array seems to be ok, without nan values, but the confidence is mostly 0's and some 0.25's. Anyone who can guide me, on where I could have gone wrong? ​ code: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html [yolov3 secion] output: outputs = [[0.03846154 0.03846154 0.27884614 0.21634616 0.5 0.25 ] [0.03846154 0.03846154 0. 0.47596154 0. 0. ] [0.03846154 0.03846154 0.89663464 0.78365386 0.5 0.25 ] ... [0.99038464 0.99038464 0.02403846 0.03125 0.5 0.25 ] [0.99038464 0.99038464 0.03846154 0.07211538 0.5 0. ] [0.99038464 0.99038464 0.07932692 0.05528846 0.5 0. ]] confidence = 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.25 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 ... submitted by /u/ersa17 [link] [comments]  ( 1 min )
  • Open

    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    CONet: Channel Optimization for Convolutional Neural Networks. (arXiv:2108.06822v2 [cs.CV] UPDATED)
    Neural Architecture Search (NAS) has shifted network design from using human intuition to leveraging search algorithms guided by evaluation metrics. We study channel size optimization in convolutional neural networks (CNN) and identify the role it plays in model accuracy and complexity. Current channel size selection methods are generally limited by discrete sample spaces while suffering from manual iteration and simple heuristics. To solve this, we introduce an efficient dynamic scaling algorithm -- CONet -- that automatically optimizes channel sizes across network layers for a given CNN. Two metrics -- "\textit{Rank}" and "\textit{Rank Average Slope}" -- are introduced to identify the information accumulated in training. The algorithm dynamically scales channel sizes up or down over a fixed searching phase. We conduct experiments on CIFAR10/100 and ImageNet datasets and show that CONet can find efficient and accurate architectures searched in ResNet, DARTS, and DARTS+ spaces that outperform their baseline models. This document supersedes previously published paper in ICCV2021-NeurArch workshop. An additional section is included on manual scaling of channel size in CNNs to numerically validate of the metrics used in searching optimum channel configurations in CNNs.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Measuring AI Systems Beyond Accuracy. (arXiv:2204.04211v1 [cs.SE])
    Current test and evaluation (T&E) methods for assessing machine learning (ML) system performance often rely on incomplete metrics. Testing is additionally often siloed from the other phases of the ML system lifecycle. Research investigating cross-domain approaches to ML T&E is needed to drive the state of the art forward and to build an Artificial Intelligence (AI) engineering discipline. This paper advocates for a robust, integrated approach to testing by outlining six key questions for guiding a holistic T&E strategy.  ( 2 min )
    Learning-Based Vulnerability Analysis of Cyber-Physical Systems. (arXiv:2103.06271v3 [cs.CR] UPDATED)
    This work focuses on the use of deep learning for vulnerability analysis of cyber-physical systems (CPS). Specifically, we consider a control architecture widely used in CPS (e.g., robotics), where the low-level control is based on e.g., the extended Kalman filter (EKF) and an anomaly detector. To facilitate analyzing the impact potential sensing attacks could have, our objective is to develop learning-enabled attack generators capable of designing stealthy attacks that maximally degrade system operation. We show how such problem can be cast within a learning-based grey-box framework where parts of the runtime information are known to the attacker, and introduce two models based on feed-forward neural networks (FNN); both models are trained offline, using a cost function that combines the attack effects on the estimation error and the residual signal used for anomaly detection, so that the trained models are capable of recursively generating such effective sensor attacks in real-time. The effectiveness of the proposed methods is illustrated on several case studies.  ( 2 min )
    A Low-Cost Robot Science Kit for Education with Symbolic Regression for Hypothesis Discovery and Validation. (arXiv:2204.04187v1 [cond-mat.mtrl-sci])
    The next generation of physical science involves robot scientists - autonomous physical science systems capable of experimental design, execution, and analysis in a closed loop. Such systems have shown real-world success for scientific exploration and discovery, including the first discovery of a best-in-class material. To build and use these systems, the next generation workforce requires expertise in diverse areas including ML, control systems, measurement science, materials synthesis, decision theory, among others. However, education is lagging. Educators need a low-cost, easy-to-use platform to teach the required skills. Industry can also use such a platform for developing and evaluating autonomous physical science methodologies. We present the next generation in science education, a kit for building a low-cost autonomous scientist. The kit was used during two courses at the University of Maryland to teach undergraduate and graduate students autonomous physical science. We discuss its use in the course and its greater capability to teach the dual tasks of autonomous model exploration, optimization, and determination, with an example of autonomous experimental "discovery" of the Henderson-Hasselbalch equation.  ( 2 min )
    Neural graph embeddings via matrix factorization for link prediction: smoothing or truncating negatives?. (arXiv:2011.09907v2 [cs.SI] UPDATED)
    Learning good quality neural graph embeddings has long been achieved by minimzing the pointwise mutual information (PMI) for co-occuring nodes in simulated random walks. This design choice has been mostly popularized by the direct application of the highly-successful word embedding algorithm word2vec to predicting the formation of new links in social, co-citation, and biological networks. However, such a skeumorphic design of graph embedding methods entails a truncation of information coming from pairs of nodes with low PMI. To circumvent this issue, we propose an improved approach to learning low-rank factorization embeddings that incorporate information from such unlikely pairs of nodes and show that it can improve the link prediction performance of baseline methods from 1.2% to 24.2%. Based on our results and observations we outline further steps that could improve the design of next graph embedding algorithms that are based on matrix factorizaion.  ( 2 min )
    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. (arXiv:2204.04170v1 [eess.AS])
    Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to more robust embeddings. Now, selecting the most relevant augmentations has proven crucial for better downstream performances. Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training. This is performed with respect to a downstream task of interest, hence saving a costly hyper-parameter search. Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations. We furthermore conduct a qualitative analysis of the automatically selected augmentations and their variation according to the considered final downstream dataset.  ( 2 min )
    Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement. (arXiv:2102.05185v4 [cs.LG] UPDATED)
    In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.  ( 2 min )
    Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. (arXiv:1911.11397v3 [eess.SY] UPDATED)
    This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Karaoker: Alignment-free singing voice synthesis with speech training data. (arXiv:2204.04127v1 [eess.AS])
    Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice following a multi-dimensional template extracted from a source waveform of an unseen speaker/singer. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. Except for multi-tasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.  ( 2 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    GPSAF: A Generalized Probabilistic Surrogate-Assisted Framework for Constrained Single- and Multi-objective Optimization. (arXiv:2204.04054v1 [math.OC])
    Significant effort has been made to solve computationally expensive optimization problems in the past two decades, and various optimization methods incorporating surrogates into optimization have been proposed. Most research focuses on either exploiting the surrogate by defining a utility optimization problem or customizing an existing optimization method to use one or multiple approximation models. However, only a little attention has been paid to generic concepts applicable to different types of algorithms and optimization problems simultaneously. Thus this paper proposes a generalized probabilistic surrogate-assisted framework (GPSAF), applicable to a broad category of unconstrained and constrained, single- and multi-objective optimization algorithms. The idea is based on a surrogate assisting an existing optimization method. The assistance is based on two distinct phases, one facilitating exploration and another exploiting the surrogates. The exploration and exploitation of surrogates are automatically balanced by performing a probabilistic knockout tournament among different clusters of solutions. A study of multiple well-known population-based optimization algorithms is conducted with and without the proposed surrogate assistance on single- and multi-objective optimization problems with a maximum solution evaluation budget of 300 or less. The results indicate the effectiveness of applying GPSAF to an optimization algorithm and the competitiveness with other surrogate-assisted algorithms.  ( 2 min )
    Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection. (arXiv:2204.04042v1 [cs.CL])
    Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage them for model training and development. With this in mind, we explore behaviour-aware learning by examining several fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems. To address potential pitfalls of training on data originally intended for evaluation, we train and evaluate models on different configurations of HateCheck by holding out categories of test cases, which enables us to estimate performance on potentially overlooked system properties. The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups, suggesting that models can potentially generalise to overlooked functionalities. However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class and that the procedure led to overfitting to the HateCheck data distribution.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.
    Text-Aware Predictive Monitoring of Business Processes. (arXiv:2104.09962v2 [cs.AI] CROSS LISTED)
    The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.
    Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations. (arXiv:2201.02571v2 [cs.LG] UPDATED)
    This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved.
    A Manifold View of Adversarial Risk. (arXiv:2203.13277v2 [cs.LG] UPDATED)
    The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.
    An analysis of over-sampling labeled data in semi-supervised learning with FixMatch. (arXiv:2201.00604v2 [cs.LG] UPDATED)
    Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.
    Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. (arXiv:2203.14009v3 [cs.LG] UPDATED)
    Deep neuroevolution and deep Reinforcement Learning have received a lot of attention in the last years. Some works have compared them, highlighting theirs pros and cons, but an emerging trend consists in combining them so as to benefit from the best of both worlds. In this paper, we provide a survey of this emerging trend by organizing the literature into related groups of works and casting all the existing combinations in each group into a generic framework. We systematically cover all easily available papers irrespective of their publication status, focusing on the combination mechanisms rather than on the experimental results. In total, we cover 45 algorithms more recent than 2017. We hope this effort will favor the growth of the domain by facilitating the understanding of the relationships between the methods, leading to deeper analyses, outlining missing useful comparisons and suggesting new combinations of mechanisms.
    Generative Adversarial Method Based On Neural Tangent Kernels. (arXiv:2204.04090v1 [cs.LG])
    The recent development of Generative adversarial networks (GANs) has driven many computer vision applications. Despite the great synthesis quality, training GANs often confronts several issues, including non-convergence, mode collapse, and gradient vanishing. There exist several workarounds, for example, regularizing Lipschitz continuity and adopting Wasserstein distance. Although these methods can partially solve the problems, we argue that the problems are result from modeling the discriminator with deep neural networks. In this paper, we base on newly derived deep neural network theories called Neural Tangent Kernel (NTK) and propose a new generative algorithm called generative adversarial NTK (GA-NTK). The GA-NTK models the discriminator as a Gaussian Process (GP). With the help of the NTK theories, the training dynamics of GA-NTK can be described with a closed-form formula. To synthesize data with the closed-form formula, the objectives can be simplified into a single-level adversarial optimization problem. We conduct extensive experiments on real-world datasets, and the results show that GA-NTK can generate images comparable to those by GANs but is much easier to train under various conditions. We also study the current limitations of GA-NTK and propose some workarounds to make GA-NTK more practical.  ( 2 min )
    Ranking with submodular functions on a budget. (arXiv:2204.04168v1 [cs.DS])
    Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.
    Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures. (arXiv:2112.05224v2 [cs.CR] UPDATED)
    We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their outputs so as to support an adversary-chosen sentiment or point of view -- but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization. Model spinning introduces a "meta-backdoor" into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary. Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims. To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call "pseudo-words," and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary's meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.
    Federated Learning with Adaptive Batchnorm for Personalized Healthcare. (arXiv:2112.00734v2 [cs.LG] UPDATED)
    There is a growing interest in applying machine learning techniques for healthcare. Recently, federated machine learning (FL) is gaining popularity since it allows researchers to train powerful models without compromising data privacy and security. However, the performance of existing FL approaches often deteriorates when encountering non-iid situations where there exist distribution gaps among clients, and few previous efforts focus on personalization in healthcare. In this article, we propose AdaFed to tackle domain shifts and obtain personalized models for local clients. AdaFed learns the similarity between clients via the statistics of the batch normalization layers while preserving the specificity of each client with different local batch normalization. Comprehensive experiments on five healthcare benchmarks demonstrate that AdaFed achieves better accuracy compared to state-of-the-art methods (e.g., \textbf{10}\%+ accuracy improvement for PAMAP2) with faster convergence speed.
    Human Hands as Probes for Interactive Object Understanding. (arXiv:2112.09120v2 [cs.CV] UPDATED)
    Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.
    Image prediction of disease progression by style-based manifold extrapolation. (arXiv:2111.11439v2 [eess.IV] UPDATED)
    Disease-modifying management aims to prevent deterioration and progression of the disease, not just relieve symptoms. Unfortunately, the development of necessary therapies is often hampered by the failure to recognize the presymptomatic disease and limited understanding of disease development. We present a generic solution for this problem by a methodology that allows the prediction of progression risk and morphology in individuals using a latent extrapolation optimization approach. To this end, we combined a regularized generative adversarial network (GAN) and a latent nearest neighbor algorithm for joint optimization to generate plausible images of future time points. We evaluated our method on osteoarthritis (OA) data from a multi-center longitudinal study (the Osteoarthritis Initiative, OAI). With presymptomatic baseline data, our model is generative and significantly outperforms the end-to-end learning model in discriminating the progressive cohort. Two experiments were performed with seven experienced radiologists. When no synthetic follow-up radiographs were provided, our model performed better than all seven radiologists. In cases where the synthetic follow-ups generated by our model were available, the specificity and sensitivity of all readers in discriminating progressors increased from $72.3\%$ to $88.6\%$ and from $42.1\%$ to $51.6\%$, respectively. Our results open up a new possibility of using model-based morphology and risk prediction to make predictions about future disease occurrence, as demonstrated in the example of OA.
    Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition. (arXiv:2110.05267v2 [eess.AS] UPDATED)
    Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility. However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. Experimental results show that the proposed method achieves absolute word error rate (WER) reduction of 4.1% over the best baseline on RATS Channel-A corpus. Our further analysis indicates that the proposed IFF-Net can complement some missing information in the over-suppressed enhanced feature.
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.
    DAD: Data-free Adversarial Defense at Test Time. (arXiv:2204.01568v2 [cs.LG] UPDATED)
    Deep models are highly susceptible to adversarial attacks. Such attacks are carefully crafted imperceptible noises that can fool the network and can cause severe consequences when deployed. To encounter them, the model requires training data for adversarial training or explicit regularization-based techniques. However, privacy has become an important concern, restricting access to only trained models but not the training data (e.g. biometric data). Also, data curation is expensive and companies may have proprietary rights over it. To handle such situations, we propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'. We solve it in two stages: a) detection and b) correction of adversarial samples. Our adversarial sample detection framework is initially trained on arbitrary data and is subsequently adapted to the unlabelled test data through unsupervised domain adaptation. We further correct the predictions on detected adversarial samples by transforming them in Fourier domain and obtaining their low frequency component at our proposed suitable radius for model prediction. We demonstrate the efficacy of our proposed technique via extensive experiments against several adversarial attacks and for different model architectures and datasets. For a non-robust Resnet-18 model pre-trained on CIFAR-10, our detection method correctly identifies 91.42% adversaries. Also, we significantly improve the adversarial accuracy from 0% to 37.37% with a minimal drop of 0.02% in clean accuracy on state-of-the-art 'Auto Attack' without having to retrain the model.
    Measuring disentangled generative spatio-temporal representation. (arXiv:2202.04821v2 [cs.LG] UPDATED)
    Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to spatio-temporal data mining, there has been little attention to further disentangle the latent features and understanding their contribution to the model performance, particularly their mutual information and correlation across features. In this study, we adopt two state-of-the-art disentangled representation learning methods and apply them to three large-scale public spatio-temporal datasets. To evaluate their performance, we propose an internal evaluation metric focusing on the degree of correlations among latent variables of the learned representations and the prediction performance of the downstream tasks. Empirical results show that our modified method can learn disentangled representations that achieve the same level of performance as existing state-of-the-art ST deep learning methods in a spatio-temporal sequence forecasting problem. Additionally, we find that our methods can be used to discover real-world spatial-temporal semantics to describe the variables in the learned representation.
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.
    A Spatial-Temporal Attention Multi-Graph Convolution Network for Ride-Hailing Demand Prediction Based on Periodicity with Offset. (arXiv:2203.12505v2 [cs.LG] UPDATED)
    Ride-hailing service is becoming a leading part in urban transportation. To improve the efficiency of ride-hailing service, accurate prediction of transportation demand is a fundamental challenge. In this paper, we tackle this problem from both aspects of network structure and data-set formulation. For network design, we propose a spatial-temporal attention multi-graph convolution network (STA-MGCN). A spatial-temporal layer in STA-MGCN is developed to capture the temporal correlations by temporal attention mechanism and temporal gate convolution, and the spatial correlations by multigraph convolution. A feature cluster layer is introduced to learn latent regional functions and to reduce the computation burden. For the data-set formulation, we develop a novel approach which considers the transportation feature of periodicity with offset. Instead of only using history data during the same time period, the history order demand in forward and backward neighboring time periods from yesterday and last week are also included. Extensive experiments on the three real-world datasets of New-York, Chicago and Chengdu show that the proposed algorithm achieves the state-of-the-art performance for ride-hailing demand prediction.
    AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning. (arXiv:2110.13005v4 [cs.LG] UPDATED)
    In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters. Since computation is relatively inexpensive on modern GPUs, designing and implementing extremely efficient communication in these parallel training algorithms is critical for extracting the maximum performance. This paper presents AxoNN, a parallel deep learning framework that exploits asynchrony and message-driven execution to schedule neural network operations on each GPU, thereby reducing GPU idle time and maximizing hardware efficiency. By using the CPU memory as a scratch space for offloading data periodically during training, AxoNN is able to reduce GPU memory consumption by four times. This allows us to increase the number of parameters per GPU by four times, thus reducing the amount of communication and increasing performance by over 13%. When tested against large transformer models with 12-100 billion parameters on 48-384 NVIDIA Tesla V100 GPUs, AxoNN achieves a per-GPU throughput of 49.4-54.78% of theoretical peak and reduces the training time by 22-37 days (15-25% speedup) as compared to the state-of-the-art.
    Federated Causal Inference in Heterogeneous Observational Data. (arXiv:2107.11732v3 [cs.LG] UPDATED)
    Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
    Low-Resource Adaptation of Open-Domain Generative Chatbots. (arXiv:2108.06329v2 [cs.CL] UPDATED)
    Recent work building open-domain chatbots has demonstrated that increasing model size improves performance. On the other hand, latency and connectivity considerations dictate the move of digital assistants on the device. Giving a digital assistant like Siri, Alexa, or Google Assistant the ability to discuss just about anything leads to the need for reducing the chatbot model size such that it fits on the user's device. We demonstrate that low parameter models can simultaneously retain their general knowledge conversational abilities while improving in a specific domain. Additionally, we propose a generic framework that accounts for variety in question types, tracks reference throughout multi-turn conversations, and removes inconsistent and potentially toxic responses. Our framework seamlessly transitions between chatting and performing transactional tasks, which will ultimately make interactions with digital assistants more human-like. We evaluate our framework on 1 internal and 4 public benchmark datasets using both automatic (Perplexity) and human (SSA - Sensibleness and Specificity Average) evaluation metrics and establish comparable performance while reducing model parameters by 90%.
    GCA-Net : Utilizing Gated Context Attention for Improving Image Forgery Localization and Detection. (arXiv:2112.04298v3 [cs.CV] UPDATED)
    Forensic analysis of manipulated pixels requires the identification of various hidden and subtle features from images. Conventional image recognition models generally fail at this task because they are biased and more attentive toward the dominant local and spatial features. In this paper, we propose a novel Gated Context Attention Network (GCA-Net) that utilizes non-local attention in conjunction with a gating mechanism in order to capture the finer image discrepancies and better identify forged regions. The proposed framework uses high dimensional embeddings to filter and aggregate the relevant context from coarse feature maps at various stages of the decoding process. This improves the network's understanding of global differences and reduces false-positive localizations. Our evaluation on standard image forensic benchmarks shows that GCA-Net can both compete against and improve over state-of-the-art networks by an average of 4.7% AUC. Additional ablation studies also demonstrate the method's robustness against attributions and resilience to false-positive predictions.
    Generalizing to Unseen Domains: A Survey on Domain Generalization. (arXiv:2103.03097v6 [cs.LG] UPDATED)
    Machine learning systems generally assume that the training and testing distributions are the same. To this end, a key requirement is to develop models that can generalize to unseen distributions. Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increasing interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. Great progress has been made in the area of domain generalization for years. This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. We categorize recent algorithms into three classes: data manipulation, representation learning, and learning strategy, and present several popular algorithms in detail for each category. Third, we introduce the commonly used datasets, applications, and our open-sourced codebase for fair evaluation. Finally, we summarize existing literature and present some potential research topics for the future.
    Group-based Distinctive Image Captioning with Memory Attention. (arXiv:2108.09151v4 [cs.CV] UPDATED)
    Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high accuracy based on popular metrics, i.e., BLEU, CIDEr, and SPICE, the ability of captions to distinguish the target image from other similar images is under-explored. To generate distinctive captions, a few pioneers employ contrastive learning or re-weighted the ground-truth captions, which focuses on one single input image. However, the relationships between objects in a similar image group (e.g., items or properties within the same album or fine-grained events) are neglected. In this paper, we improve the distinctiveness of image captions using a Group-based Distinctive Captioning Model (GdisCap), which compares each image with other images in one similar group and highlights the uniqueness of each image. In particular, we propose a group-based memory attention (GMA) module, which stores object features that are unique among the image group (i.e., with low similarity to objects in other images). These unique object features are highlighted when generating captions, resulting in more distinctive captions. Furthermore, the distinctive words in the ground-truth captions are selected to supervise the language decoder and GMA. Finally, we propose a new evaluation metric, distinctive word rate (DisWordRate) to measure the distinctiveness of captions. Quantitative results indicate that the proposed method significantly improves the distinctiveness of several baseline models, and achieves the state-of-the-art performance on both accuracy and distinctiveness. Results of a user study agree with the quantitative evaluation and demonstrate the rationality of the new metric DisWordRate.
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.
    Omni-Training for Data-Efficient Deep Learning. (arXiv:2110.07510v2 [cs.LG] UPDATED)
    Learning a generalizable deep model from a few examples in a short time remains a major challenge of machine learning, which has impeded its wide deployment to many scenarios. Recent advances reveal that a properly pre-trained model endows an important property: transferability. A higher transferability of the learned representations indicates a better generalizability across domains of different distributions (domain transferability), or across tasks of different semantics (task transferability). Transferability has become the key to enable data-efficient deep learning, however, existing pre-training methods focus only on domain transferability while meta-training methods only on task transferability. This restricts their data-efficiency in downstream scenarios of diverging domains and tasks. A finding of this paper is that even a tight combination of pre-training and meta-training cannot achieve both kinds of transferability. This motivates the proposed Omni-Training framework towards data-efficient deep learning. Our first contribution is Omni-Net, a tri-flow architecture. Besides the joint representation flow, Omni-Net introduces two new parallel flows for pre-training and meta-training, respectively responsible for learning representations of domain transferability and task transferability. Omni-Net coordinates the parallel flows by routing them via the joint-flow, making each gain the other kind of transferability. Our second contribution is Omni-Loss, in which a self-distillation regularization is imposed to enable knowledge transfer across the training process. Omni-Training is a general framework that accommodates many existing pre-training and meta-training algorithms. A thorough evaluation on cross-task and cross-domain datasets in classification, regression and reinforcement learning problems shows that Omni-Training consistently outperforms the state-of-the-art methods.
    Constraints Penalized Q-learning for Safe Offline Reinforcement Learning. (arXiv:2107.09003v3 [cs.LG] UPDATED)
    We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that na\"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
    How to distribute data across tasks for meta-learning?. (arXiv:2103.08463v3 [cs.LG] UPDATED)
    Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks. Interestingly, Neuroscience experiments have shown that human visual skills also transfer better from easy tasks. We prove these results mathematically on mixed linear regression, and we show empirically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet. Our results provide guidance for allocating labels across tasks when collecting data for meta-learning.
    Quantum Machine Learning Framework for Virtual Screening in Drug Discovery: a Prospective Quantum Advantage. (arXiv:2204.04017v1 [quant-ph])
    Machine Learning (ML) for Ligand Based Virtual Screening (LB-VS) is an important in-silico tool for discovering new drugs in a faster and cost-effective manner, especially for emerging diseases such as COVID-19. In this paper, we propose a general-purpose framework combining a classical Support Vector Classifier (SVC) algorithm with quantum kernel estimation for LB-VS on real-world databases, and we argue in favor of its prospective quantum advantage. Indeed, we heuristically prove that our quantum integrated workflow can, at least in some relevant instances, provide a tangible advantage compared to state-of-art classical algorithms operating on the same datasets, showing strong dependence on target and features selection method. Finally, we test our algorithm on IBM Quantum processors using ADRB2 and COVID-19 datasets, showing that hardware simulations provide results in line with the predicted performances and can surpass classical equivalents.
    KCD: Knowledge Walks and Textual Cues Enhanced Political Perspective Detection in News Media. (arXiv:2204.04046v1 [cs.LG])
    Political perspective detection has become an increasingly important task that can help combat echo chambers and political polarization. Previous approaches generally focus on leveraging textual content to identify stances, while they fail to reason with background knowledge or leverage the rich semantic and syntactic textual labels in news articles. In light of these limitations, we propose KCD, a political perspective detection approach to enable multi-hop knowledge reasoning and incorporate textual cues as paragraph-level labels. Specifically, we firstly generate random walks on external knowledge graphs and infuse them with news text representations. We then construct a heterogeneous information network to jointly model news content as well as semantic, syntactic and entity cues in news articles. Finally, we adopt relational graph neural networks for graph-level representation learning and conduct political perspective detection. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on two benchmark datasets. We further examine the effect of knowledge walks and textual cues and how they contribute to our approach's data efficiency.
    Predicting Berth Stay for Tanker Terminals: A Systematic and Dynamic Approach. (arXiv:2204.04085v1 [cs.CE])
    Given the trend of digitization and increasing number of maritime transport, prediction of vessel berth stay has been triggered for requirements of operation research and scheduling optimization problem in the era of maritime big data, which takes a significant part in port efficiency and maritime logistics enhancement. This study proposes a systematic and dynamic approach of predicting berth stay for tanker terminals. The approach covers three innovative aspects: 1) Data source employed is multi-faceted, including cargo operation data from tanker terminals, time-series data from automatic identification system (AIS), etc. 2) The process of berth stay is decomposed into multiple blocks according to data analysis and information extraction innovatively, and practical operation scenarios are also developed accordingly. 3) The predictive models of berth stay are developed on the basis of prior data analysis and information extraction under two methods, including regression and decomposed distribution. The models are evaluated under four dynamic scenarios with certain designated cargoes among two different terminals. The evaluation results show that the proposed approach can predict berth stay with the accuracy up to 98.81% validated by historical baselines, and also demonstrate the proposed approach has dynamic capability of predicting berth stay among the scenarios. The model may be potentially applied for short-term pilot-booking or scheduling optimizations within a reasonable time frame for advancement of port intelligence and logistics efficiency.
    Global Update Guided Federated Learning. (arXiv:2204.03920v1 [cs.LG])
    Federated learning protects data privacy and security by exchanging models instead of data. However, unbalanced data distributions among participating clients compromise the accuracy and convergence speed of federated learning algorithms. To alleviate this problem, unlike previous studies that limit the distance of updates for local models, we propose global-update-guided federated learning (FedGG), which introduces a model-cosine loss into local objective functions, so that local models can fit local data distributions under the guidance of update directions of global models. Furthermore, considering that the update direction of a global model is informative in the early stage of training, we propose adaptive loss weights based on the update distances of local models. Numerical simulations show that, compared with other advanced algorithms, FedGG has a significant improvement on model convergence accuracies and speeds. Additionally, compared with traditional fixed loss weights, adaptive loss weights enable our algorithm to be more stable and easier to implement in practice.
    Ontology Matching Through Absolute Orientation of Embedding Spaces. (arXiv:2204.04040v1 [cs.AI])
    Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based mapping approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evaluation using synthetic and real-world datasets. We find in experiments with synthetic data, that the approach works very well on similarly structured graphs; it handles alignment noise better than size and structural differences in the ontologies.
    Self-supervised Speaker Diarization. (arXiv:2204.04166v1 [cs.SD])
    Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised deep-learning model for speaker diarization. Specifically, the study focuses on generating high-quality neural speaker representations without any annotated data, as well as on estimating secondary hyperparameters of the model without annotations. The speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker. The trained encoder model is then used to self-generate pseudo-labels to subsequently train a similarity score between different segments of the same call using probabilistic linear discriminant analysis (PLDA) and further to learn a clustering stopping threshold. We compared our model to state-of-the-art unsupervised as well as supervised baselines on the CallHome benchmarks. According to empirical results, our approach outperforms unsupervised methods when only two speakers are present in the call, and is only slightly worse than recent supervised models.
    C-NMT: A Collaborative Inference Framework for Neural Machine Translation. (arXiv:2204.04043v1 [cs.LG])
    Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.
    Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings. (arXiv:2204.04063v1 [cs.CV])
    One intriguing property of adversarial attacks is their "transferability" -- an adversarial example crafted with respect to one deep neural network (DNN) model is often found effective against other DNNs as well. Intensive research has been conducted on this phenomenon under simplistic controlled conditions. Yet, thus far, there is still a lack of comprehensive understanding about transferability-based attacks ("transfer attacks") in real-world environments. To bridge this critical gap, we conduct the first large-scale systematic empirical study of transfer attacks against major cloud-based MLaaS platforms, taking the components of a real transfer attack into account. The study leads to a number of interesting findings which are inconsistent to the existing ones, including: (1) Simple surrogates do not necessarily improve real transfer attacks. (2) No dominant surrogate architecture is found in real transfer attacks. (3) It is the gap between posterior (output of the softmax layer) rather than the gap between logit (so-called $\kappa$ value) that increases transferability. Moreover, by comparing with prior works, we demonstrate that transfer attacks possess many previously unknown properties in real-world environments, such as (1) Model similarity is not a well-defined concept. (2) $L_2$ norm of perturbation can generate high transferability without usage of gradient and is a more powerful source than $L_\infty$ norm. We believe this work sheds light on the vulnerabilities of popular MLaaS platforms and points to a few promising research directions.
    EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector. (arXiv:2204.04154v1 [cs.CR])
    Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 2. (arXiv:2204.03955v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing weighted average turnaround time. The proposed approach is developed as a heuristic algorithm applied and investigated through different observation windows with weekly rolling horizon paradigm method. The experimental results show that the proposed approach is effective and promising on mitigating the turnaround time of vessels. The results demonstrate that largest potential savings of turnaround time (weighted average) are around 17 hours (28%) reduction on baseline of 1-week observation, 45 hours (37%) reduction on baseline of 2-week observation and 70 hours (40%) reduction on baseline of 3-week observation. Even though the experimental results are based on historical datasets, the results potentially present significant benefits if real-time applications were applied under a quadratic computational complexity.
    Labeling-Free Comparison Testing of Deep Learning Models. (arXiv:2204.03994v1 [cs.LG])
    Various deep neural networks (DNNs) are developed and reported for their tremendous success in multiple domains. Given a specific task, developers can collect massive DNNs from public sources for efficient reusing and avoid redundant work from scratch. However, testing the performance (e.g., accuracy and robustness) of multiple DNNs and giving a reasonable recommendation that which model should be used is challenging regarding the scarcity of labeled data and demand of domain expertise. Existing testing approaches are mainly selection-based where after sampling, a few of the test data are labeled to discriminate DNNs. Therefore, due to the randomness of sampling, the performance ranking is not deterministic. In this paper, we propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness. The main idea is to learn a Bayesian model to infer the models' specialty only based on predicted labels. To evaluate the effectiveness of our approach, we undertook exhaustive experiments on 9 benchmark datasets spanning in the domains of image, text, and source code, and 165 DNNs. In addition to accuracy, we consider the robustness against synthetic and natural distribution shifts. The experimental results demonstrate that the performance of existing approaches degrades under distribution shifts. Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $\tau$, respectively, regardless of the dataset and distribution shift. Additionally, we investigated the impact of model quality (accuracy and robustness) and diversity (standard deviation of the quality) on the testing effectiveness and observe that there is a higher chance of a good result when the quality is over 50\% and the diversity is larger than 18\%.  ( 2 min )
    ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation. (arXiv:2204.03992v1 [cs.LG])
    Electrocardiograms (ECGs) have shown unique patterns to distinguish between different subjects and present important advantages compared to other biometric traits, such as difficulty to counterfeit, liveness detection, and ubiquity. Also, with the success of Deep Learning technologies, ECG biometric recognition has received increasing interest in recent years. However, it is not easy to evaluate the improvements of novel ECG proposed methods, mainly due to the lack of public data and standard experimental protocols. In this study, we perform extensive analysis and comparison of different scenarios in ECG biometric recognition. Both verification and identification tasks are investigated, as well as single- and multi-session scenarios. Finally, we also perform single- and multi-lead ECG experiments, considering traditional scenarios using electrodes in the chest and limbs and current user-friendly wearable devices. In addition, we present ECGXtractor, a robust Deep Learning technology trained with an in-house large-scale database and able to operate successfully across various scenarios and multiple databases. We introduce our proposed feature extractor, trained with multiple sinus-rhythm heartbeats belonging to 55,967 subjects, and provide a general public benchmark evaluation with detailed experimental protocol. We evaluate the system performance over four different databases: i) our in-house database, ii) PTB, iii) ECG-ID, and iv) CYBHi. With the widely used PTB database, we achieve Equal Error Rates of 0.14% and 2.06% in verification, and accuracies of 100% and 96.46% in identification, respectively in single- and multi-session analysis. We release the source code, experimental protocol details, and pre-trained models in GitHub to advance in the field.  ( 2 min )
    Does Robustness on ImageNet Transfer to Downstream Tasks?. (arXiv:2204.03934v1 [cs.CV])
    As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these techniques often target models trained on ImageNet classification. At the same time, it is a common practice to use ImageNet pretrained backbones for downstream tasks such as object detection, semantic segmentation, and image classification from different domains. This raises a question: Can these robust image classifiers transfer robustness to downstream tasks? For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet. For CIFAR10 classification, we find that models that are robustified for ImageNet do not retain robustness when fully fine-tuned. These findings suggest that current robustification techniques tend to emphasize ImageNet evaluations. Moreover, network architecture is a strong source of robustness when we consider transfer learning.  ( 2 min )
    SnapMode: An Intelligent and Distributed Large-Scale Fashion Image Retrieval Platform Based On Big Data and Deep Generative Adversarial Network Technologies. (arXiv:2204.03998v1 [cs.IR])
    Fashion is now among the largest industries worldwide, for it represents human history and helps tell the worlds story. As a result of the Fourth Industrial Revolution, the Internet has become an increasingly important source of fashion information. However, with a growing number of web pages and social data, it is nearly impossible for humans to manually catch up with the ongoing evolution and the continuously variable content in this domain. The proper management and exploitation of big data can pave the way for the substantial growth of the global economy as well as citizen satisfaction. Therefore, computer scientists have found it challenging to handle e-commerce fashion websites by using big data and machine learning technologies. This paper first proposes a scalable focused Web Crawler engine based on the distributed computing platforms to extract and process fashion data on e-commerce websites. The role of the proposed platform is then described in developing a disentangled feature extraction method by employing deep convolutional generative adversarial networks (DCGANs) for content-based image indexing and retrieval. Finally, the state-of-the-art solutions are compared, and the results of the proposed approach are analyzed on a standard dataset. For the real-life implementation of the proposed solution, a Web-based application is developed on Apache Storm, Kafka, Solr, and Milvus platforms to create a fashion search engine called SnapMode.  ( 2 min )
    Channel model for end-to-end learning of communications systems: A survey. (arXiv:2204.03944v1 [cs.LG])
    The traditional communication model based on chain of multiple independent processing blocks is constraint to efficiency and introduces artificial barriers. Thus, each individually optimized block does not guarantee end-to-end performance of the system. Recently, end-to-end learning of communications systems through machine learning (ML) have been proposed to optimize the system metrics jointly over all components. These methods show performance improvements but has a limitation that it requires a differentiable channel model. In this study, we have summarized the existing approaches that alleviates this problem. We believe that this study will provide better understanding of the topic and an insight into future research in this field.  ( 2 min )
    Mel-spectrogram features for acoustic vehicle detection and speed estimation. (arXiv:2204.04013v1 [cs.LG])
    The paper addresses acoustic vehicle detection and speed estimation from single sensor measurements. We predict the vehicle's pass-by instant by minimizing clipped vehicle-to-microphone distance, which is predicted from the mel-spectrogram of input audio, in a supervised learning approach. In addition, mel-spectrogram-based features are used directly for vehicle speed estimation, without introducing any intermediate features. The results show that the proposed features can be used for accurate vehicle detection and speed estimation, with an average error of 7.87 km/h. If we formulate speed estimation as a classification problem, with a 10 km/h discretization interval, the proposed method attains the average accuracy of 48.7% for correct class prediction and 91.0% when an offset of one class is allowed. The proposed method is evaluated on a dataset of 304 urban-environment on-field recordings of ten different vehicles.  ( 2 min )
    The Complexity of Markov Equilibrium in Stochastic Games. (arXiv:2204.03991v1 [cs.LG])
    We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.  ( 2 min )
    DiversiTree: Computing Diverse Sets of Near-Optimal Solutions to Mixed-Integer Optimization Problems. (arXiv:2204.03822v1 [cs.DM])
    While most methods for solving mixed-integer optimization problems seek a single optimal solution, finding a diverse set of near-optimal solutions can often be more useful. State of the art methods for generating diverse near-optimal solutions usually take a two-phase approach, first finding a set of near-optimal solutions and then finding a diverse subset. In contrast, we present a method of finding a set of diverse solutions by emphasizing diversity within the search for near-optimal solutions. Specifically, within a branch-and-bound framework, we investigate parameterized node selection rules that explicitly consider diversity. Our results indicate that our approach significantly increases diversity of the final solution set. When compared with existing methods for finding diverse near-optimal sets, our method runs with similar run-time as regular node selection methods and gives a diversity improvement of up to 140%. In contrast, popular node selection rules such as best-first search gives an improvement of no more than 40%. Further, we find that our method is most effective when diversity is emphasized more in node selection when deeper in the tree and when the solution set has grown large enough.  ( 2 min )
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 1. (arXiv:2204.03899v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing average wait time and turnaround time. The proposed approach consists of enhanced particle swarm optimization (ePSO) as kernel and augmented firefly algorithm (AFA) as global optimal search. Two paradigm methods of the proposed approach are investigated, which are batch method and rolling horizon method. The experimental results show that both paradigm methods of proposed approach can effectively enhance port efficiency. The average wait time could be significantly reduced by 86.0% - 95.5%, and the average turnaround time could eventually save 38.2% - 42.4% with respect to historical benchmarks. Moreover, the paradigm method of rolling horizon could reduce to 20 mins on running time over 3-month datasets, rather than 4 hrs on batch method at corresponding maximum performance.
    Network Shuffling: Privacy Amplification via Random Walks. (arXiv:2204.03919v1 [cs.CR])
    Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, introducing a centralized entity to the originally local privacy model loses some appeals of not having any centralized entity as in local differential privacy. Moreover, implementing a shuffler in a reliable way is not trivial due to known security issues and/or requirements of advanced hardware or secure computation technology. Motivated by these practical considerations, we rethink the shuffle model to relax the assumption of requiring a centralized, trusted shuffler. We introduce network shuffling, a decentralized mechanism where users exchange data in a random-walk fashion on a network/graph, as an alternative of achieving privacy amplification via anonymity. We analyze the threat model under such a setting, and propose distributed protocols of network shuffling that is straightforward to implement in practice. Furthermore, we show that the privacy amplification rate is similar to other privacy amplification techniques such as uniform shuffling. To our best knowledge, among the recently studied intermediate trust models that leverage privacy amplification techniques, our work is the first that is not relying on any centralized entity to achieve privacy amplification.  ( 2 min )
    Study of a committee of neural networks for biometric hand-geometry recognition. (arXiv:2204.03935v1 [cs.CV])
    This Paper studies different committees of neural networks for biometric pattern recognition. We use the neural nets as classifiers for identification and verification purposes. We show that a committee of nets can improve the recognition rates when compared with a multi-start initialization algo-rithm that just picks up the neural net which offers the best performance. On the other hand, we found that there is no strong correlation between identifi-cation and verification applications using the same classifier.  ( 2 min )
    Disability prediction in multiple sclerosis using performance outcome measures and demographic data. (arXiv:2204.03969v1 [cs.LG])
    Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.  ( 2 min )
    KGI: An Integrated Framework for Knowledge Intensive Language Tasks. (arXiv:2204.03985v1 [cs.CL])
    In a recent work, we presented a novel state-of-the-art approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. In this paper, we propose a system based on an enhanced version of this approach where we train task specific models for other knowledge intensive language tasks, such as open domain question answering (QA), dialogue and fact checking. Our system achieves results comparable to the best models in the KILT leaderboards. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine each other. Particularly, we show how accuracy in dialogue can be improved using the QA model. A short video demonstrating the system is available here - \url{https://ibm.box.com/v/kgi-interactive-demo} .  ( 2 min )
    Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation. (arXiv:2204.03872v1 [cs.LG])
    Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though optimal measurement policy is actually dependent on the goal of measurement, we mainly focus on retrieving complete data, so called as imputation. Also, we adapt the imputation method to missingness varying with measurement policy. However, learning measurement policy and imputation requires complete data which is impossible to be observed, unfortunately. To tackle this problem, we propose a data generation method and joint learning algorithm. The main idea is that 1) the data generation method is inherited by imputation method, and 2) the adaptation of imputation encourages measurement policy to learn more than individual learning. We implemented some variations of proposed algorithm for two different datasets and various missing rates. From the experimental results, we demonstrate that our algorithm is generally applicable and outperforms baseline methods.  ( 2 min )
    CD$^2$-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning. (arXiv:2204.03880v1 [cs.CV])
    Federated learning (FL) is a distributed learning paradigm that enables multiple clients to collaboratively learn a shared global model. Despite the recent progress, it remains challenging to deal with heterogeneous data clients, as the discrepant data distributions usually prevent the global model from delivering good generalization ability on each participating client. In this paper, we propose CD^2-pFed, a novel Cyclic Distillation-guided Channel Decoupling framework, to personalize the global model in FL, under various settings of data heterogeneity. Different from previous works which establish layer-wise personalization to overcome the non-IID data across different clients, we make the first attempt at channel-wise assignment for model personalization, referred to as channel decoupling. To further facilitate the collaboration between private and shared weights, we propose a novel cyclic distillation scheme to impose a consistent regularization between the local and global model representations during the federation. Guided by the cyclical distillation, our channel decoupling framework can deliver more accurate and generalized results for different kinds of heterogeneity, such as feature skew, label distribution skew, and concept shift. Comprehensive experiments on four benchmarks, including natural image and medical image analysis tasks, demonstrate the consistent effectiveness of our method on both local and external validations.  ( 2 min )
    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment. (arXiv:2204.04016v1 [eess.AS])
    Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.  ( 2 min )
    Exploring the Universality of Hadronic Jet Classification. (arXiv:2204.03812v1 [hep-ph])
    The modeling of jet substructure significantly differs between Parton Shower Monte Carlo (PSMC) programs. Despite this, we observe that machine learning classifiers trained on different PSMCs learn nearly the same function. This means that when these classifiers are applied to the same PSMC for testing, they result in nearly the same performance. This classifier universality indicates that a machine learning model trained on one simulation and tested on another simulation (or data) will likely be optimal. Our observations are based on detailed studies of shallow and deep neural networks applied to simulated Lorentz boosted Higgs jet tagging at the LHC.  ( 2 min )
    SuperNet in Neural Architecture Search: A Taxonomic Survey. (arXiv:2204.03916v1 [cs.CV])
    Deep Neural Networks (DNN) have made significant progress in a wide range of visual recognition tasks such as image classification, object detection, and semantic segmentation. The evolution of convolutional architectures has led to better performance by incurring expensive computational costs. In addition, network design has become a difficult task, which is labor-intensive and requires a high level of domain knowledge. To mitigate such issues, there have been studies for a variety of neural architecture search methods that automatically search for optimal architectures, achieving models with impressive performance that outperform human-designed counterparts. This survey aims to provide an overview of existing works in this field of research and specifically focus on the supernet optimization that builds a neural network that assembles all the architectures as its sub models by using weight sharing. We aim to accomplish that by categorizing supernet optimization by proposing them as solutions to the common challenges found in the literature: data-side optimization, poor rank correlation alleviation, and transferable NAS for a number of deployment scenarios.  ( 2 min )
    Data-Driven Evaluation of Training Action Space for Reinforcement Learning. (arXiv:2204.03840v1 [cs.LG])
    Training action space selection for reinforcement learning (RL) is conflict-prone due to complex state-action relationships. To address this challenge, this paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation to avoid unnecessary explorations. The effectiveness of the methodology is illustrated using a cloud infrastructure resource tuning case study. It reduces the search space by 80\% and categorizes the training action sets into dispensable and indispensable groups. Additionally, it ranks different training actions to facilitate high-performance yet cost-efficient RL model design. The proposed data-driven methodology is extensible to different domains, use cases, and reinforcement learning algorithms.  ( 2 min )
    Engagement Detection with Multi-Task Training in E-Learning Environments. (arXiv:2204.04020v1 [cs.CV])
    Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-of-the-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.
    Blockchain as an Enabler for Transfer Learning in Smart Environments. (arXiv:2204.03959v1 [cs.AI])
    The knowledge, embodied in machine learning models for intelligent systems, is commonly associated with time-consuming and costly processes such as large-scale data collection, data labelling, network training, and fine-tuning of models. Sharing and reuse of these elaborated models between intelligent systems deployed in a different environment, which is known as transfer learning, would facilitate the adoption of services for the users and accelerates the uptake of intelligent systems in environments such as smart building and smart city applications. In this context, the communication and knowledge exchange between AI-enabled environments depend on a complicated networks of systems, system of systems, digital assets, and their chain of dependencies that hardly follows the centralized schema of traditional information systems. Rather, it requires an adaptive decentralized system architecture that is empowered by features such as data provenance, workflow transparency, and validation of process participants. In this research, we propose a decentralized and adaptive software framework based on blockchain and knowledge graph technologies that supports the knowledge exchange and interoperability between IoT-enabled environments, in a transparent and trustworthy way.  ( 2 min )
    Does the Market of Citations Reward Reproducible Work?. (arXiv:2204.03829v1 [cs.DL])
    The field of bibliometrics, studying citations and behavior, is critical to the discussion of reproducibility. Citations are one of the primary incentive and reward systems for academic work, and so we desire to know if this incentive rewards reproducible work. Yet to the best of our knowledge, only one work has attempted to look at this combined space, concluding that non-reproducible work is more highly cited. We show that answering this question is more challenging than first proposed, and subtle issues can inhibit a robust conclusion. To make inferences with more robust behavior, we propose a hierarchical Bayesian model that incorporates the citation rate over time, rather than the total number of citations after a fixed amount of time. In doing so we show that, under current evidence the answer is more likely that certain fields of study such as Medicine and Machine Learning (ML) do correlate reproducible works with more citations, but other fields appear to have no relationship. Further, we find that making code available and thoroughly referencing prior works appear to also positively correlate with increased citations. Our code and data can be found at https://github.com/EdwardRaff/ReproducibleCitations .  ( 2 min )
    A posteriori learning for quasi-geostrophic turbulence parametrization. (arXiv:2204.03911v1 [physics.flu-dyn])
    The use of machine learning to build subgrid parametrizations for climate models is receiving growing attention. State-of-the-art strategies address the problem as a supervised learning task and optimize algorithms that predict subgrid fluxes based on information from coarse resolution models. In practice, training data are generated from higher resolution numerical simulations transformed in order to mimic coarse resolution simulations. By essence, these strategies optimize subgrid parametrizations to meet so-called $\textit{a priori}$ criteria. But the actual purpose of a subgrid parametrization is to obtain good performance in terms of $\textit{a posteriori}$ metrics which imply computing entire model trajectories. In this paper, we focus on the representation of energy backscatter in two dimensional quasi-geostrophic turbulence and compare parametrizations obtained with different learning strategies at fixed computational complexity. We show that strategies based on $\textit{a priori}$ criteria yield parametrizations that tend to be unstable in direct simulations and describe how subgrid parametrizations can alternatively be trained end-to-end in order to meet $\textit{a posteriori}$ criteria. We illustrate that end-to-end learning strategies yield parametrizations that outperform known empirical and data-driven schemes in terms of performance, stability and ability to apply to different flow configurations. These results support the relevance of differentiable programming paradigms for climate models in the future.  ( 2 min )
    Federated Learning with Partial Model Personalization. (arXiv:2204.03809v1 [cs.LG])
    We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm.  ( 2 min )
    Decomposition-based Generation Process for Instance-Dependent Partial Label Learning. (arXiv:2204.03845v1 [cs.LG])
    Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior(MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Quantum version of the k-NN classifier based on a quantum sorting algorithm. (arXiv:2204.03761v1 [quant-ph])
    In this work we introduce a quantum sorting algorithm with adaptable requirements of memory and circuit depth, and then use it to develop a new quantum version of the classical machine learning algorithm known as k-nearest neighbors (k-NN). Both the efficiency and performance of this new quantum version of the k-NN algorithm are compared to those of the classical k-NN and another quantum version proposed by Schuld et al. \cite{Int13}. Results show that the efficiency of both quantum algorithms is similar to each other and superior to that of the classical algorithm. On the other hand, the performance of our proposed quantum k-NN algorithm is superior to the one proposed by Schuld et al. and similar to that of the classical k-NN.  ( 2 min )
    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition. (arXiv:2204.03793v1 [eess.AS])
    Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although previous proof-of-concept studies have validated the effectiveness of Personal VAD, there are still several critical challenges to address before this model can be used in production: first, the quality must be satisfactory in both enrollment and enrollment-less scenarios; second, it should operate in a streaming fashion; and finally, the model size should be small enough to fit a limited latency and CPU/Memory budget. To meet the multi-faceted requirements, we propose a series of novel designs: 1) advanced speaker embedding modulation methods; 2) a new training paradigm to generalize to enrollment-less conditions; 3) architecture and runtime optimizations for latency and resource restrictions. Extensive experiments on a realistic speech recognition system demonstrated the state-of-the-art performance of our proposed method.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    A Learnable Variational Model for Joint Multimodal MRI Reconstruction and Synthesis. (arXiv:2204.03804v1 [eess.IV])
    Generating multi-contrasts/modal MRI of the same anatomy enriches diagnostic information but is limited in practice due to excessive data acquisition time. In this paper, we propose a novel deep-learning model for joint reconstruction and synthesis of multi-modal MRI using incomplete k-space data of several source modalities as inputs. The output of our model includes reconstructed images of the source modalities and high-quality image synthesized in the target modality. Our proposed model is formulated as a variational problem that leverages several learnable modality-specific feature extractors and a multimodal synthesis module. We propose a learnable optimization algorithm to solve this model, which induces a multi-phase network whose parameters can be trained using multi-modal MRI data. Moreover, a bilevel-optimization framework is employed for robust parameter training. We demonstrate the effectiveness of our approach using extensive numerical experiments.  ( 2 min )
    A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. (arXiv:2204.03719v1 [cs.LG])
    Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose the first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.  ( 2 min )
    Decentralized Event-Triggered Federated Learning with Heterogeneous Communication Thresholds. (arXiv:2204.03726v1 [cs.LG])
    A recent emphasis of distributed learning research has been on federated learning (FL), in which model training is conducted by the data-collecting devices. Existing research on FL has mostly focused on a star topology learning architecture with synchronized (time-triggered) model training rounds, where the local models of the devices are periodically aggregated by a centralized coordinating node. However, in many settings, such a coordinating node may not exist, motivating efforts to fully decentralize FL. In this work, we propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over the network graph topology. We consider heterogeneous communication event thresholds at each device that weigh the change in local model parameters against the available local resources in deciding the benefit of aggregations at each iteration. Through theoretical analysis, we demonstrate that our methodology achieves asymptotic convergence to the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature, and without restrictive connectivity requirements on the underlying topology. Subsequent numerical results demonstrate that our methodology obtains substantial improvements in communication requirements compared with FL baselines.  ( 2 min )
    Global ECG Classification by Self-Operational Neural Networks with Feature Injection. (arXiv:2204.03768v1 [cs.LG])
    Objective: Global (inter-patient) ECG classification for arrhythmia detection over Electrocardiogram (ECG) signal is a challenging task for both humans and machines. The main reason is the significant variations of both normal and arrhythmic ECG patterns among patients. Automating this process with utmost accuracy is, therefore, highly desirable due to the advent of wearable ECG sensors. However, even with numerous deep learning approaches proposed recently, there is still a notable gap in the performance of global and patient-specific ECG classification performances. This study proposes a novel approach to narrow this gap and propose a real-time solution with shallow and compact 1D Self-Organized Operational Neural Networks (Self-ONNs). Methods: In this study, we propose a novel approach for inter-patient ECG classification using a compact 1D Self-ONN by exploiting morphological and timing information in heart cycles. We used 1D Self-ONN layers to automatically learn morphological representations from ECG data, enabling us to capture the shape of the ECG waveform around the R peaks. We further inject temporal features based on RR interval for timing characterization. The classification layers can thus benefit from both temporal and learned features for the final arrhythmia classification. Results: Using the MIT-BIH arrhythmia benchmark database, the proposed method achieves the highest classification performance ever achieved, i.e., 99.21% precision, 99.10% recall, and 99.15% F1-score for normal (N) segments; 82.19% precision, 82.50% recall, and 82.34% F1-score for the supra-ventricular ectopic beat (SVEBs); and finally, 94.41% precision, 96.10% recall, and 95.2% F1-score for the ventricular-ectopic beats (VEBs).  ( 2 min )
    Automated Design of Salient Object Detection Algorithms with Brain Programming. (arXiv:2204.03722v1 [cs.CV])
    Despite recent improvements in computer vision, artificial visual systems' design is still daunting since an explanation of visual computing algorithms remains elusive. Salient object detection is one problem that is still open due to the difficulty of understanding the brain's inner workings. Progress on this research area follows the traditional path of hand-made designs using neuroscience knowledge. In recent years two different approaches based on genetic programming appear to enhance their technique. One follows the idea of combining previous hand-made methods through genetic programming and fuzzy logic. The other approach consists of improving the inner computational structures of basic hand-made models through artificial evolution. This research work proposes expanding the artificial dorsal stream using a recent proposal to solve salient object detection problems. This approach uses the benefits of the two main aspects of this research area: fixation prediction and detection of salient objects. We decided to apply the fusion of visual saliency and image segmentation algorithms as a template. The proposed methodology discovers several critical structures in the template through artificial evolution. We present results on a benchmark designed by experts with outstanding results in comparison with the state-of-the-art.  ( 2 min )
    Compositional Generalization and Decomposition in Neural Program Synthesis. (arXiv:2204.03758v1 [cs.LG])
    When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what we can measure is whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize, e.g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data. Based on this characterization, we introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets, SCAN and RobustFill. Finally, we make first attempts to improve the compositional generalization ability of Transformer models along these axes through novel attention mechanisms that draw inspiration from a human-like decomposition strategy. Empirically, we find our modified Transformer models generally perform better than natural baselines, but the tasks remain challenging.  ( 2 min )
    Physics-assisted Generative Adversarial Network for X-Ray Tomography. (arXiv:2204.03703v1 [eess.IV])
    X-ray tomography is capable of imaging the interior of objects in three dimensions non-invasively, with applications in biomedical imaging, materials science, electronic inspection, and other fields. The reconstruction process can be an ill-conditioned inverse problem, requiring regularization to obtain satisfactory reconstructions. Recently, deep learning has been adopted for tomographic reconstruction. Unlike iterative algorithms which require a distribution that is known a priori, deep reconstruction networks can learn a prior distribution through sampling the training distributions. In this work, we develop a Physics-assisted Generative Adversarial Network (PGAN), a two-step algorithm for tomographic reconstruction. In contrast to previous efforts, our PGAN utilizes maximum-likelihood estimates derived from the measurements to regularize the reconstruction with both known physics and the learned prior. Synthetic objects with spatial correlations are integrated circuits (IC) from a proposed model CircuitFaker. Compared with maximum-likelihood estimation, PGAN can reduce the photon requirement with limited projection angles to achieve a given error rate. We further attribute the improvement to the learned prior by reconstructing objects created without spatial correlations. The advantages of using a prior from deep learning in X-ray tomography may further enable low-photon nanoscale imaging.  ( 2 min )
    GreaseVision: Rewriting the Rules of the Interface. (arXiv:2204.03731v1 [cs.HC])
    Digital harms can manifest across any interface. Key problems in addressing these harms include the high individuality of harms and the fast-changing nature of digital systems. As a result, we still lack a systematic approach to study harms and produce interventions for end-users. We put forward GreaseVision, a new framework that enables end-users to collaboratively develop interventions against harms in software using a no-code approach and recent advances in few-shot machine learning. The contribution of the framework and tool allow individual end-users to study their usage history and create personalized interventions. Our contribution also enables researchers to study the distribution of harms and interventions at scale.  ( 2 min )
    Mixing Signals: Data Augmentation Approach for Deep Learning Based Modulation Recognition. (arXiv:2204.03737v1 [eess.SP])
    With the rapid development of deep learning, automatic modulation recognition (AMR), as an important task in cognitive radio, has gradually transformed from traditional feature extraction and classification to automatic classification by deep learning technology. However, deep learning models are data-driven methods, which often require a large amount of data as the training support. Data augmentation, as the strategy of expanding dataset, can improve the generalization of the deep learning models and thus improve the accuracy of the models to a certain extent. In this paper, for AMR of radio signals, we propose a data augmentation strategy based on mixing signals and consider four specific methods (Random Mixing, Maximum-Similarity-Mixing, $\theta-$Similarity Mixing and n-times Random Mixing) to achieve data augmentation. Experiments show that our proposed method can improve the classification accuracy of deep learning based AMR models in the full public dataset RML2016.10a. In particular, for the case of a single signal-to-noise ratio signal set, the classification accuracy can be significantly improved, which verifies the effectiveness of the methods.  ( 2 min )
    A Kernel Method to Nonlinear Location Estimation with RSS-based Fingerprint. (arXiv:2204.03724v1 [cs.NI])
    This paper presents a nonlinear location estimation to infer the position of a user holding a smartphone. We consider a large location with $M$ number of grid points, each grid point is labeled with a unique fingerprint consisting of the received signal strength (RSS) values measured from $N$ number of Bluetooth Low Energy (BLE) beacons. Given the fingerprint observed by the smartphone, the user's current location can be estimated by finding the top-k similar fingerprints from the list of fingerprints registered in the database. Besides the environmental factors, the dynamicity in holding the smartphone is another source to the variation in fingerprint measurements, yet there are not many studies addressing the fingerprint variability due to dynamic smartphone positions held by human hands during online detection. To this end, we propose a nonlinear location estimation using the kernel method. Specifically, our proposed method comprises of two steps: 1) a beacon selection strategy to select a subset of beacons that is insensitive to the subtle change of holding positions, and 2) a kernel method to compute the similarity between this subset of observed signals and all the fingerprints registered in the database. The experimental results based on large-scale data collected in a complex building indicate a substantial performance gain of our proposed approach in comparison to state-of-the-art methods. The dataset consisting of the signal information collected from the beacons is available online.  ( 2 min )
    Introducing a Framework and a Decision Protocol to Calibrate Recommender Systems. (arXiv:2204.03706v1 [cs.IR])
    Recommender Systems use the user's profile to generate a recommendation list with unknown items to a target user. Although the primary goal of traditional recommendation systems is to deliver the most relevant items, such an effort unintentionally can cause collateral effects including low diversity and unbalanced genres or categories, benefiting particular groups of categories. This paper proposes an approach to create recommendation lists with a calibrated balance of genres, avoiding disproportion between the user's profile interests and the recommendation list. The calibrated recommendations consider concomitantly the relevance and the divergence between the genres distributions extracted from the user's preference and the recommendation list. The main claim is that calibration can contribute positively to generate fairer recommendations. In particular, we propose a new trade-off equation, which considers the users' bias to provide a recommendation list that seeks for the users' tendencies. Moreover, we propose a conceptual framework and a decision protocol to generate more than one thousand combinations of calibrated systems in order to find the best combination. We compare our approach against state-of-the-art approaches using multiple domain datasets, which are analyzed by rank and calibration metrics. The results indicate that the trade-off, which considers the users' bias, produces positive effects on the precision and to the fairness, thus generating recommendation lists that respect the genre distribution and, through the decision protocol, we also found the best system for each dataset.  ( 2 min )
    TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates. (arXiv:2204.03671v1 [cs.CV])
    We propose a novel approach to generate temporally coherent UV coordinates for loose clothing. Our method is not constrained by human body outlines and can capture loose garments and hair. We implemented a differentiable pipeline to learn UV mapping between a sequence of RGB inputs and textures via UV coordinates. Instead of treating the UV coordinates of each frame separately, our data generation approach connects all UV coordinates via feature matching for temporal stability. Subsequently, a generative model is trained to balance the spatial quality and temporal stability. It is driven by supervised and unsupervised losses in both UV and image spaces. Our experiments show that the trained models output high-quality UV coordinates and generalize to new poses. Once a sequence of UV coordinates has been inferred by our model, it can be used to flexibly synthesize new looks and modified visual styles. Compared to existing methods, our approach reduces the computational workload to animate new outfits by several orders of magnitude.  ( 2 min )
    T4PdM: a Deep Neural Network based on the Transformer Architecture for Fault Diagnosis of Rotating Machinery. (arXiv:2204.03725v1 [cs.AI])
    Deep learning and big data algorithms have become widely used in industrial applications to optimize several tasks in many complex systems. Particularly, deep learning model for diagnosing and prognosing machinery health has leveraged predictive maintenance (PdM) to be more accurate and reliable in decision making, in this way avoiding unnecessary interventions, machinery accidents, and environment catastrophes. Recently, Transformer Neural Networks have gained notoriety and have been increasingly the favorite choice for Natural Language Processing (NLP) tasks. Thus, given their recent major achievements in NLP, this paper proposes the development of an automatic fault classifier model for predictive maintenance based on a modified version of the Transformer architecture, namely T4PdM, to identify multiple types of faults in rotating machinery. Experimental results are developed and presented for the MaFaulDa and CWRU databases. T4PdM was able to achieve an overall accuracy of 99.98% and 98% for both datasets, respectively. In addition, the performance of the proposed model is compared to other previously published works. It has demonstrated the superiority of the model in detecting and classifying faults in rotating industrial machinery. Therefore, the proposed Transformer-based model can improve the performance of machinery fault analysis and diagnostic processes and leverage companies to a new era of the Industry 4.0. In addition, this methodology can be adapted to any other task of time series classification.  ( 2 min )
    Brain-Inspired Hyperdimensional Computing: How Thermal-Friendly for Edge Computing?. (arXiv:2204.03739v1 [cs.ET])
    Brain-inspired hyperdimensional computing (HDC) is an emerging machine learning (ML) methods. It is based on large vectors of binary or bipolar symbols and a few simple mathematical operations. The promise of HDC is a highly efficient implementation for embedded systems like wearables. While fast implementations have been presented, other constraints have not been considered for edge computing. In this work, we aim at answering how thermal-friendly HDC for edge computing is. Devices like smartwatches, smart glasses, or even mobile systems have a restrictive cooling budget due to their limited volume. Although HDC operations are simple, the vectors are large, resulting in a high number of CPU operations and thus a heavy load on the entire system potentially causing temperature violations. In this work, the impact of HDC on the chip's temperature is investigated for the first time. We measure the temperature and power consumption of a commercial embedded system and compare HDC with conventional CNN. We reveal that HDC causes up to 6.8{\deg}C higher temperatures and leads to up to 47% more CPU throttling. Even when both HDC and CNN aim for the same throughput (i.e., perform a similar number of classifications per second), HDC still causes higher on-chip temperatures due to the larger power consumption.  ( 2 min )
    Using Multiple Self-Supervised Tasks Improves Model Robustness. (arXiv:2204.03714v1 [cs.CV])
    Deep networks achieve state-of-the-art performance on computer vision tasks, yet they fail under adversarial attacks that are imperceptible to humans. In this paper, we propose a novel defense that can dynamically adapt the input using the intrinsic structure from multiple self-supervised tasks. By simultaneously using many self-supervised tasks, our defense avoids over-fitting the adapted image to one specific self-supervised task and restores more intrinsic structure in the image compared to a single self-supervised task approach. Our approach further improves robustness and clean accuracy significantly compared to the state-of-the-art single task self-supervised defense. Our work is the first to connect multiple self-supervised tasks to robustness, and suggests that we can achieve better robustness with more intrinsic signal from visual data.  ( 2 min )
    BankNote-Net: Open dataset for assistive universal currency recognition. (arXiv:2204.03738v1 [cs.CV])
    Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image recognition. However, the datasets and models available for this task are limited, both in terms of dataset size and in variety of currencies covered. In this work, we collect a total of 24,826 images of banknotes in variety of assistive settings, spanning 17 currencies and 112 denominations. Using supervised contrastive learning, we develop a machine learning model for universal currency recognition. This model learns compliant embeddings of banknote images in a variety of contexts, which can be shared publicly (as a compressed vector representation), and can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft. We share our encoder model and the embeddings as an open dataset in our BankNote-Net repository.  ( 2 min )
    Learning to Walk Autonomously via Reset-Free Quality-Diversity. (arXiv:2204.03655v1 [cs.LG])
    Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.  ( 2 min )
    Identification of Autism spectrum disorder based on a novel feature selection method and Variational Autoencoder. (arXiv:2204.03654v1 [eess.IV])
    The development of noninvasive brain imaging such as resting-state functional magnetic resonance imaging (rs-fMRI) and its combination with AI algorithm provides a promising solution for the early diagnosis of Autism spectrum disorder (ASD). However, the performance of the current ASD classification based on rs-fMRI still needs to be improved. This paper introduces a classification framework to aid ASD diagnosis based on rs-fMRI. In the framework, we proposed a novel filter feature selection method based on the difference between step distribution curves (DSDC) to select remarkable functional connectivities (FCs) and utilized a multilayer perceptron (MLP) which was pretrained by a simplified Variational Autoencoder (VAE) for classification. We also designed a pipeline consisting of a normalization procedure and a modified hyperbolic tangent (tanh) activation function to replace the original tanh function, further improving the model accuracy. Our model was evaluated by 10 times 10-fold cross-validation and achieved an average accuracy of 78.12%, outperforming the state-of-the-art methods reported on the same dataset. Given the importance of sensitivity and specificity in disease diagnosis, two constraints were designed in our model which can improve the model's sensitivity and specificity by up to 9.32% and 10.21%, respectively. The added constraints allow our model to handle different application scenarios and can be used broadly.  ( 2 min )
    Qade: Solving Differential Equations on Quantum Annealers. (arXiv:2204.03657v1 [quant-ph])
    We present a general method, called Qade, for solving differential equations using a quantum annealer. The solution is obtained as a linear combination of a set of basis functions. On current devices, Qade can solve systems of coupled partial differential equations that depend linearly on the solution and its derivatives, with non-linear variable coefficients and arbitrary inhomogeneous terms. We test the method with several examples and find that state-of-the-art quantum annealers can find the solution accurately for problems requiring a small enough function basis. We provide a Python package implementing the method at gitlab.com/jccriado/qade.  ( 2 min )
    Adaptive-Gravity: A Defense Against Adversarial Samples. (arXiv:2204.03694v1 [cs.LG])
    This paper presents a novel model training solution, denoted as Adaptive-Gravity, for enhancing the robustness of deep neural network classifiers against adversarial examples. We conceptualize the model parameters/features associated with each class as a mass characterized by its centroid location and the spread (standard deviation of the distance) of features around the centroid. We use the centroid associated with each cluster to derive an anti-gravity force that pushes the centroids of different classes away from one another during network training. Then we customized an objective function that aims to concentrate each class's features toward their corresponding new centroid, which has been obtained by anti-gravity force. This methodology results in a larger separation between different masses and reduces the spread of features around each centroid. As a result, the samples are pushed away from the space that adversarial examples could be mapped to, effectively increasing the degree of perturbation needed for making an adversarial example. We have implemented this training solution as an iterative method consisting of four steps at each iteration: 1) centroid extraction, 2) anti-gravity force calculation, 3) centroid relocation, and 4) gravity training. Gravity's efficiency is evaluated by measuring the corresponding fooling rates against various attack models, including FGSM, MIM, BIM, and PGD using LeNet and ResNet110 networks, benchmarked against MNIST and CIFAR10 classification problems. Test results show that Gravity not only functions as a powerful instrument to robustify a model against state-of-the-art adversarial attacks but also effectively improves the model training accuracy.  ( 2 min )
  • Open

    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.  ( 2 min )
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.  ( 3 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.  ( 2 min )
    Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts. (arXiv:2009.06170v5 [stat.ME] UPDATED)
    We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational regimes that depend on the sparsity level of the graph. Furthermore, even in more challenging regimes, we prove that our bootstrap procedure offers valid coverage and vanishing confidence intervals. For the quadratic bootstrap, we establish an Edgeworth expansion and show that this procedure offers higher-order accuracy under appropriate sparsity conditions. We complement our theoretical results with a simulation study and real data analysis and verify that our procedure offers state-of-the-art performance for several functionals.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    Handling highly correlated genes in prediction analysis of genomic studies. (arXiv:2007.02455v4 [stat.AP] UPDATED)
    Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction model is still not well addressed. High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. Furthermore, when a causal gene (whose variants have an actual biological effect on a phenotype) is highly correlated with other genes, most algorithms select the feature gene from the correlated group in a purely data-driven manner. Since the correlation structure among genes could change substantially when condition changes, the prediction model based on not correctly selected feature genes is unreliable. Therefore, we aim to keep the causal biological signal in the prediction process and build a more robust prediction model. Method: We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection. Our novel grouping algorithm can be integrated into existing prediction algorithms to enhance their prediction performance. Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change. Second, it reports whole correlated gene groups as discovered biomarkers for prediction tasks, allowing researchers to conduct follow-up studies to identify causal genes within the identified groups. Result: Using real benchmark scRNA-seq datasets with simulated cell phenotypes, we demonstrate our novel method significantly outperforms standard models in both (1) prediction of cell phenotypes and (2) feature gene selection.  ( 2 min )
    Seeded graph matching for the correlated Wigner model via the projected power method. (arXiv:2204.04099v1 [math.ST])
    In the graph matching problem we observe two graphs $G,H$ and the goal is to find an assignment (or matching) between their vertices such that some measure of edge agreement is maximized. We assume in this work that the observed pair $G,H$ has been drawn from the correlated Wigner model -- a popular model for correlated weighted graphs -- where the entries of the adjacency matrices of $G$ and $H$ are independent Gaussians and each edge of $G$ is correlated with one edge of $H$ (determined by the unknown matching) with the edge correlation described by a parameter $\sigma\in [0,1)$. In this paper, we analyse the performance of the projected power method (PPM) as a seeded graph matching algorithm where we are given an initial partially correct matching (called the seed) as side information. We prove that if the seed is close enough to the ground-truth matching, then with high probability, PPM iteratively improves the seed and recovers the ground-truth matching (either partially or exactly) in $\mathcal{O}(\log n)$ iterations. Our results prove that PPM works even in regimes of constant $\sigma$, thus extending the analysis in (Mao et al.,2021) for the sparse Erd\"os-Renyi model to the (dense) Wigner model. As a byproduct of our analysis, we see that the PPM framework generalizes some of the state-of-art algorithms for seeded graph matching. We support and complement our theoretical findings with numerical experiments on synthetic data.  ( 2 min )

  • Open

    Trippy AI Dream 30 - Howl's Moving Castle Post-Apocalyptic War Scenes VQ...
    submitted by /u/LordPewPew777 [link] [comments]
    Is there a AI which can turn images into simple versions of the original image?
    So that the wrinkles and shadows are removed, etc. submitted by /u/xXNOdrugsForMEXx [link] [comments]
    The Singularity is Now
    submitted by /u/ManandMultiverse [link] [comments]
    AI Graphics: Design your dream body with a slider
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Does AI exist that takes an image of a real person and edit/generate photos of them?
    submitted by /u/NootropicLove [link] [comments]
  • Open

    Circular slide rule
    I explained the basics of how a slide rule works in the previous post. But how does a circular slide rule work? Apparently the prop Mr. Spock is holding is an E6B aircraft slide rule. It includes a circular slide rule and more functionality. Start with an ordinary straight slide rule, with each bar labeled […] Circular slide rule first appeared on John D. Cook.  ( 2 min )
    Why a slide rule works
    Suppose you have two sticks. The length of one is log x, and the length of the other is log y. If you put the two sticks end to end, the combined length is log x + log y = log xy. That’s the basic idea behind a slide rule. The simplest slide rule consists […] Why a slide rule works first appeared on John D. Cook.  ( 2 min )
  • Open

    [D] Machine Learning - WAYR (What Are You Reading) - Week 135
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 1…  ( 1 min )
    [N]: Dall-E 2 Explained
    submitted by /u/giugiacaglia [link] [comments]  ( 1 min )
    [R] Use static classifiers for dynamic point cloud tasks (3D) and use action classifiers for temporal anomaly detection (2D) - Link to a free online lecture by the author in comments
    submitted by /u/pinter69 [link] [comments]  ( 1 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [P] image similarity metrics or algorithms
    I want to perform image similarity between images from frames of 2 different movie trailers. I am currently using SSIM and VGG 16 individually. But ssim does not capture color differences and VGG 16 isn't capturing structural integrity. I can use both together, but I wanted to know if there is any metric or algorithm which can capture both together with less discrepancies and can capture both together. Will appreciate any help. Thank you! submitted by /u/terminatorash2199 [link] [comments]  ( 1 min )
    [R] Interested in a Postdoctoral position bridging machine learning and neuroimaging?
    Recurrent neural networks for brain time series - Sano Centre for Computational Personalised Medicine submitted by /u/alecrimi [link] [comments]
    [R] ML with Intermediate Mathematics: VAEs with Normalized Flow (live series)
    Hi everyone, I'd like to share with you an exciting upcoming live series by Prof. Richard Xu of Hong Kong Baptist University. If you're interested, please click here to register! Description: "I have been planning to start a machine learning live series on topics that involve some intermediate mathematics, so I can help you to clarify some concepts. In order to fully grasp these concepts, you need to have sound knowledge of linear algebra, calculus, statistics and probability. However, if you just want to come and hear it for fun, please do so as well! The first topic is variational autoencoders with normalized flow, which I'll fully explain its beautiful mathematics over a period of a few sessions. You can find my notes on my GitHub site: https://github.com/roboticcam/machine-learning-notes/blob/master/files/vb_nf.pdf I will post the Zoom link to the registered participants. Please join us!" submitted by /u/ML_Live_Series [link] [comments]  ( 1 min )
    [research] Issues visualising a Resnet
    Hi all, We have a model used in cardiac MRi imaging, it is used to select the best image in a series of images. It consists of images -> Resnet -> LSTM -> output. The heatmap we generate from the Resnet alone shows output like the image attached, instead of actual anatomy, there is only little squares. We think this is likely due to the residual in the Resnet because it is not present in a VGG, but does anyone else have a better explanation, and an idea of how to visualise a Resnet? Resnet Saliency Map submitted by /u/Radiology_AI [link] [comments]  ( 1 min )
    [R] Critical analysis of deconfounded pretraining to improve visio-linguistic models
    Hi reddit, happy to share our new paper "Critical analysis of deconfounded pretraining to improve visio-linguistic models". In a nutshell, it's on the problem of out-of-distribution performance for visio-linguistic models, and it takes a closer look / surfaces some issues with an existing technique for improving OOD performance by doing automatic deconfounding (inspired by the causality framework of Structural Causal Models). ​ Paper: https://www.frontiersin.org/articles/10.3389/frai.2022.736791/full Code: https://github.com/Natithan/p1_causality Abstract: An important problem with many current visio-linguistic models is that they often depend on spurious correlations. A typical example of a spurious correlation between two variables is one that is due to a third variable causing …  ( 2 min )
    Best Papers That Solve Novel Problems? [D]
    We often talk about how the publish-or-perish paradigm leads to constant minor improvements on the same problems (Image classification, text generation, etc). What are some of the best papers that do the opposite? Rather than solving known problems in a marginally better way, they solve a new problem with known (or modified) methods. submitted by /u/SuspiciousWalrus99 [link] [comments]  ( 2 min )
    [Discussion] Advice on training document layout analysis models
    So, a bit of background, I am doing an RnD project in the area of improving the layout analysis of scientific documents. The proposed method is to use an active learning loop on standard object detection models and target those classes/layouts which are performing poorly and train the model based on them. Now, we have some selection strategies based on submodular selection functions to target the pages we want. I also have set up code to extract embeddings which will help me do the selection. But I don't have prior experience in active learning, especially setting it up with detectron2, because it seems to register a dataset to train, and it is really difficult to change dynamically in the middle of training, which is my use case. So I need some advice on the following:- The document analysis datasets are huge, DocBank is 50GB of images alone. How can I effectively store the embeddings in memory when I will call my selection algorithms mentioned above? How to set up an active learning loop in detectron2 for object detection. Or are there any alternatives? Some resources/code would be better There is some literature evidence suggesting that simple CNN backbone embeddings represent an image better than FasterRCNN or MaskRCNN embeddings. Specifically, this paper seems to be working on spliced image retrieval and it claims the following. Any thoughts/prior experience on this? Finally, is there any evidence supporting improvement in accuracy/precision in object detection using active learning? Or are there some better training paradigms? Thank you for your patience. submitted by /u/ExoticAd6868 [link] [comments]  ( 1 min )
    [D] Market Basket Analysis real-world examples and insights?
    I want to know more about Market Basket Analysis's real-world use case and unique insights/business value derived from performing the Association rule mining. I heard about the Beer-Diaper case study but many other sources invalidated it as a spurious correlation. Can someone share any example of business insights from Market Basket analysis and any interesting patterns they were able to observe?? submitted by /u/invincible_moron [link] [comments]  ( 2 min )
    [D] How are multiple training examples used in DMD, SINDy, etc.?
    The examples I have seen so far for DMD and SINDy use only 1 trajectory of the dynamical system for training. The input data is a 2D matrix, with the states/features being one dimension and time being the other dimension. But I want to use multiple trajectories of the same dynamical system for training, so the training data would be 3D (i.e., multiple 2D matrices). Are there examples where this has been done? Linear regression techniques (like pseudoinverse or LASSO) seem to be used to get the system matrix (in DMD) or the weights for the features (in SINDy). Can these methods be extended to 3D input data? submitted by /u/baigyaanik [link] [comments]  ( 1 min )
  • Open

    6 Business Applications that Badly Need Better AI
    The success and growth of AI is undeniable. Yet there are still basic tasks performing poorly, despite or because of automation. In some cases, you can blame reliance on outdated AI. In other cases, it is a result of corporate policies or multiple AI systems that compete against each other. The AI systems in question… Read More »6 Business Applications that Badly Need Better AI The post 6 Business Applications that Badly Need Better AI appeared first on Data Science Central.  ( 7 min )
    How exactly do you define Artificial Intelligence(AI)?
    How exactly do you define Artificial Intelligence(AI)? This looks like a back to basics/ back to school question – but the answer is not that simple Recently I was trying to find a good academic definition of AI for a research paper. Surprisingly, its not easy. In this post, I present a good definition for… Read More »How exactly do you define Artificial Intelligence(AI)? The post How exactly do you define Artificial Intelligence(AI)? appeared first on Data Science Central.  ( 2 min )
    Public wary of Meta’s Metaverse vision
    There are signs that Meta’s plans for the metaverse is faltering, including plummeting stock prices and the company’s announcement that it may withdraw from the EU market. The troubles stem from a myriad of issues, the most significant of which are data collection privacy issues and a lack of investor and public confidence in the… Read More »Public wary of Meta’s Metaverse vision The post Public wary of Meta’s Metaverse vision appeared first on Data Science Central.  ( 4 min )
    NLQ: Why You Might Not Need To Call A Data Analyst Anymore
    NLQ or Natural Language Query or Text-to-SQL or NL2SQL is an arm of computational linguistics, that helps users to fetch required data, visualizations, and insights, from sentences written in human language. As a business user, knowing data schema, table and column names, knowing metadata, having technical know-how of a BI tool or data querying skills… Read More »NLQ: Why You Might Not Need To Call A Data Analyst Anymore The post NLQ: Why You Might Not Need To Call A Data Analyst Anymore appeared first on Data Science Central.  ( 4 min )
    Advance in your finance and accounting careers with top technical skills
    Profession in finance and accounting is one of the top career choices for finance and accounting professionals. Employment of accountants and auditors is projected to grow 7 percent from the year 2020 to the year 2030, about as fast as the average for all occupations. About 135,000 openings for accountants and auditors are projected each year,… Read More »Advance in your finance and accounting careers with top technical skills The post Advance in your finance and accounting careers with top technical skills appeared first on Data Science Central.  ( 3 min )
  • Open

    Anybody ever programmed a 1st order differential equation model in MuJoCo?
    submitted by /u/SmarterCloud [link] [comments]  ( 1 min )
    Google AI Researchers Propose a Meta-Algorithm, Jump Start Reinforcement Learning, That Uses Prior Policies to Create a Learning Curriculum That Improves Performance
    In the field of artificial intelligence, reinforcement learning is a type of machine-learning strategy that rewards desirable behaviors while penalizing those which aren’t. An agent can perceive its surroundings and act accordingly through trial and error in general with this form or presence – it’s kind of like getting feedback on what works for you. However, learning rules from scratch in contexts with complex exploration problems is a big challenge in RL. Because the agent does not receive any intermediate incentives, it cannot determine how close it is to complete the goal. As a result, exploring the space at random becomes necessary until the door opens. Given the length of the task and the level of precision required, this is highly unlikely. Exploring the state space randomly with preliminary information should be avoided while performing this activity. This prior knowledge aids the agent in determining which states of the environment are desirable and should be investigated further. Offline data collected by human demonstrations, programmed policies, or other RL agents could be used to train a policy and then initiate a new RL policy. This would include copying the pre-trained policy’s neural network to the new RL policy in the scenario where we utilize neural networks to describe the procedures. This process transforms the new RL policy into a pre-trained one. However, as seen below, naively initializing a new RL policy like this frequently fails, especially for value-based RL approaches. Continue reading the summary Paper: https://arxiv.org/pdf/2204.02372.pdf Project: https://jumpstart-rl.github.io/ https://reddit.com/link/u0n5hv/video/fnktgf0wqqs81/player submitted by /u/No_Coffee_4638 [link] [comments]  ( 2 min )
    "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", Zeng et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Classical Dynamic Programming ve Policy Iteration
    Cracking my head trying to figure out the differences between classical dynamic programming and policy iteration. I understand that policy iteration in itself is a form of dynamic programming. But if we were to compare the traditional operations of dynamic programming with policy iteration, what would be the differences. Thank you Heaps submitted by /u/BalramVeeragoo [link] [comments]  ( 1 min )
    Is my understanding to why future rewards being considered correct?
    To my understanding, the Q value is updated like this: Q[s,a] = Q[s,a] + lr * (reward + gamma * max(Q[s,a]t+1) — Q[s,a]) Where future state reward is considered since the best current reward doesn't grantee the optimal path. E.g.: Path A: Q[s,a]t = 1, Q[s,a]t+1 = 10 Total: 11 Path B: Q[s,a]t = 5, Q[s,a]t+1 = 1 Total: 6 Not sure if this is a good analogy but here it gives the result that A is the optimal path even though its immediate reward at (t) is less than B. Further Question: Is there additional benefits to considering future rewards further than Q[s,a]t+1 ? Example: Q[s,a] = Q[s,a] + lr * (reward + gamma* max(Q[s,a]t+2) - gamma * max(Q[s,a]t+1) — Q[s,a]) submitted by /u/DangerNoodle314 [link] [comments]  ( 1 min )
    Any reason why to use several optimizers in Pytorch implementation of REDQ?
    Hi guys. I am currently implementing REDQ by modifying a working implementation of SAC (basically adapted from Spinup) and so far my implementation doesn't work, I am trying to understand why. By looking at the authors' implementation I notice they use 1 pytorch optimizer per Q network, whereas I only use 1 for all parameters. So I wonder, is there any good reason for using several optimizers here? Thanks! submitted by /u/yannbouteiller [link] [comments]  ( 1 min )
    Learning in Noisy Observation space
    I am fairly new to RL. I'm trying to train an agent (like in gym's Cartpole env) in an environment with noisy (Gaussian noise) observation. I have added Gaussian noise to angle only (not to cart position, cart velocity or ang_velocity). I was fooling around with PPO in stable_baselines but haven't had much luck. Any suggestions on what needs to be tweaked or any good algorithm for this task? Also, I tried changing the default forcing magnitude of 10 to other values like 30 but it didn't help much. Thanks submitted by /u/Black_Beard53 [link] [comments]  ( 2 min )
  • Open

    Summer school in between neuroimaging and machine learning
    submitted by /u/pasticciociccio [link] [comments]
    Dall-e and Dall-e2
    I have been into the website of OpenAI, it is unclear what has been added to Dall-e2, why should we subscribe for a github which will be made public? And what is the color code at the bottom? submitted by /u/pasticciociccio [link] [comments]  ( 1 min )
    Researchers, Including Yann Lecun, Propose ‘projUNN’: An Efficient Method For Training Deep Neural Networks With Unitary Matrices
    When deep networks or inputs involve extensive data sequences, learning in neural networks can be unstable. Recurrent states in vanilla recurrent neural networks (RNNs) are generated by repeatedly applying a linear transformation followed by a pointwise nonlinearity. This becomes unstable when the linear transformation’s eigenvalues are not of magnitude one. Unitary matrices have been utilized to solve the problem of disappearing and exploding gradients because they have eigenvalues of size one, naturally, and have been. Unitary convolutional layers have recently been developed in a similar way to aid in the development of more stable deep networks with norm-preserving transformations. The loss function’s derivative with respect to the weights is called a gradient. During backpropagation in neural networks, it is utilized to update the weights to minimize the loss function. When traveled backward with each layer, the derivative or slope continuously grows lower, resulting in a vanishing gradient. When the weight update is exponentially small, the training time is excessively long. In the worst-case scenario, the neural network training may be stopped entirely. Exploding gradients, on the other hand, occur when the slope increases with each successive layer during backpropagation. The gradient will never converge due to the high weights, causing it to oscillate around the minima without ever reaching a global minima point. Continue Reading Paper: https://arxiv.org/pdf/2203.05483.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Ran 3D art of my AI character thru ArcaneGAN; AI making art of AI.
    submitted by /u/alex-redacted [link] [comments]
    New Technology, Old Problems: The Missing Voices in Natural Language Processing
    submitted by /u/regalalgorithm [link] [comments]
    Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks
    https://preview.redd.it/pkrbloq8vjs81.png?width=1422&format=png&auto=webp&s=fef693165a6c948f626de613e4e341c25f8cf5f4 ​ Extreme-scale language models have recently exhibited incredible performance on natural language processing challenges. This is due to their ever-increasing size, exceeding 500 billion parameters. However, while these models have grown in popularity in recent years, the amount of data utilized to train them has not increased. The current generation of huge language models is clearly undertrained. Three prediction approaches for optimally choosing both model size and training length have been proposed by a DeepMind research team. Three approaches have been mentioned to estimate the optimal parameter: Change the size of the models and the number of training tokens. IsoFLOP profiles Using a parametric loss function to fit a model The ultimate pretraining loss is calculated as the number of model parameters and training tokens. They minimize the loss function under the restriction of the FLOPs function, which is equal to the computational budget because the computational budget is a probabilistic function of the number of observed training tokens and model parameters. Continue Reading This Research Summary Paper: https://arxiv.org/pdf/2203.15556.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Flaming Rose art made with snowpixelapp using AI.
    submitted by /u/AIWORQART [link] [comments]
    How do you start a professional career in the Affective Computing field?
    I'm about to graduate with a master's degree in Computer Science and I'm very passionate about Affective Computing. I would like to start looking for a job in this field, but most companies (not consulting) are looking for people with experience or a PhD. What do you recommend me to do? Continue with the PhD or try to find something, maybe in some startup? submitted by /u/_rikya_ [link] [comments]  ( 1 min )
    Deep learning to enable color vision in the dark
    submitted by /u/qptbook [link] [comments]
    Laptop for beginner?
    I'm joining MSc AI & ML this September. I want to buy a laptop. Is MacBook Air sufficient for this? If not what would you recommend to someone like me? submitted by /u/RauhanSheikh [link] [comments]  ( 1 min )
    How can I help the advancement of AI? I want to contribute and make this my career. What should I do?
    Please give a thorough and in-depth response. submitted by /u/trillswan [link] [comments]  ( 1 min )
  • Open

    Gizmo is eating a clothes basket
    submitted by /u/mspurplekris [link] [comments]
    "Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale", Ramrakhya et al 2022 {FB} (log-scaling of crowdsourced imitation learning in VR robotics)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Is it possible to implement ACER with A2C?
    I'm looking into implementing a replay buffer in A2C. I came upon the ACER [paper](https://arxiv.org/pdf/1611.01224.pdf). From my understanding, it looks like ACER is an extension of A3C, and it seems like the difference between A2C and A3C is that in A2C parameters are updated synchronously and that helps with big batch sizes. Is it still possible to implement some kind of replay buffer on A2C? Are there any papers that involve implementing a paper with A2C that you recommend I read? I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 1 min )
    Does anyone have a link to 'The RL Discord Server'
    Supposedly there is a popular discord server for the RL community, however I am having difficulty finding it. submitted by /u/jclaessens [link] [comments]
    "Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning", Qi et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Reinforcement Learning - looking for some resources
    Hello friends, I'm looking for some resources that would let me quickly start with Reinforcement Learning (preferably in Python). I have some experience with supervised learning (e.g. deep nets) and would like to complement with some RL. Preferably a walkthrough with some examples of implementation. Can you recommend something? Thanks in advance! submitted by /u/andy-codes [link] [comments]  ( 1 min )
    I'm dumb at maths: what does this mean?
    So learning about e-mail learning same have it all understood except for the max thingy. If you care enough to click this blog it's not my blog): https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56 I don't know how to turn this into to a real example: Update q values Q[state, action] = Q[state, action] + lr * (reward + gamma * np.max(Q[new_state, :]) — Q[state, action]) Specifically the last bit: np.max(Q[new_state, :]) — Q[state, action]) What does the numpy max actually operate on here? Any hard examples? Thanks. submitted by /u/Togfox [link] [comments]  ( 2 min )
  • Open

    [R] Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
    #cvpr-2022 Happy to share our CVPR-2022 paper "Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning" Paper: https://arxiv.org/pdf/2111.14213.pdf Code: https://github.com/mmendiet/FedAlign Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. submitted by /u/Extension-Sun1816 [link] [comments]  ( 1 min )
    [D] ICML2022 Domain conflicts system
    I was wondering if the Domain conflicts system working well? I got an email from the PCs and seems that the domain conflict system is not working well. It said that we can enter the conflict now but I cannot edit the conflict in the system. Could anyone tell me how to do it? Thanks! Dear ICML Authors, As we are seeing this happen, we just wanted to send you a brief explanation -- this only applies to some few papers. A few papers are losing reviews because of newly arising conflicts. If you did not enter your conflicts in CMT during the submission phase (as requested via CMT), then they could not be used in paper assignments. If you enter them now, any reviews by reviewers with conflicting domains will disappear, and you may see fewer reviews as a result. Unfortunately, we have no control over this, as the conflicts should have been entered when the paper was submitted. Best, Stefanie, Le and Csaba submitted by /u/Snoo_97274 [link] [comments]  ( 1 min )
    [D] Poll: How do you deploy models & endpoints?
    View Poll submitted by /u/martolini [link] [comments]  ( 1 min )
    [R][P] Generate images from text with Latent Diffusion LAION-400M Model + Gradio Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [P] tinydl - library to help with hyperparameter search and metric reporting in pytorch
    Hi everyone, I built a small library to help with hyperparameter search for deep learning models created with pytorch, because I got kinda tired of having to rewrite large pieces code over and over again. You can check it out here: https://github.com/michi-jeremias/tinydl or you can even install it with pip (pip install tinydl). I have included a readme and an example of how the library can be used. About the library, it's pretty flexible about reporting different metrics to the console and to tensorboard (add_scalar, add_hparam) at each stage of the process, like after a batch, epoch of after a whole run over multiple epochs. It can also be easily extended to include other metrics or new types of outputs. Since this is basically my first attempt at a software project that's not intended only to be used by myself I'd be happy about any feedback you have for me! If the project doesn't qualify to be posted here due to being too simple/too much on a beginner level, apologies for that. submitted by /u/abacaxiquaxi [link] [comments]  ( 1 min )
    [D] StyleGAN2 Path Length Regularization Implementation Clarification
    I am trying to implement stylegan2 and there are so many things here that are not explained either well, or at all in the paper. ​ How exactly is path length regularization implemented? In this PT code we can see that the $|J^T_w.y|$ is computed as follows: ​ def g_path_regularize(fake_img, latents, mean_path_length, decay=0.01): noise = torch.randn_like(fake_img) / math.sqrt( fake_img.shape[2] * fake_img.shape[3] ) grad, = autograd.grad( outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True ) path_lengths = torch.sqrt(grad.pow(2).sum(2).mean(1)) path_mean = mean_path_length + decay * (path_lengths.mean() - mean_path_length) path_penalty = (path_lengths - path_mean).pow(2).mean() return path_penalty, path_mean.detach(), path_lengths This is based on this official TF implementation. The problem I have is that from what I understand, fake_img is 4D, and latents is 2D. The grad output in this case will be 2D and grad.pow(2).sum(2) cannot be computed because the third axis does not exist. Obviously people who are using these repos have not reported any issue regarding mismatch of shapes and axes, so I believe there is something else going on. Since I'm trying to implement this in my own network, I cannot get the desired shape any how. I get a 2D gradient output. submitted by /u/feryet [link] [comments]  ( 1 min )
    [P] Jax/Haiku pretrained models: MobileNet, ResNet, VGG, Xception.
    I released a repository of models with optional pretrained weights(Weights are taken from TF/Keras) to be used for tasks like prediction, feature extraction and fine-tuning. Github: https://github.com/abarcel/haikumodels Currently Available Models MobileNet ResNet [50, 101, 152] VGG [16, 19] Xception Also planning to release more, as soon as I find time for it. submitted by /u/abarcel [link] [comments]
    [D] Denoising in the latent space
    I spent some time reading about and playing around with speech denoising DNNs ~2019. At the time the popular architecture was U-Net (encoder -> bottleneck -> decoder with skip connections) operating on spectrograms. These U-Nets were trained directly on noisy/clean speech pairs and the loss was the difference between the predicted denoised images and actual denoised image. MSE between the predicted/actual images was a baseline loss but people alse added "feature loss" or sometimes a GAN-based loss function as well. Anyway a cursory reading of the DALL-E 2 paper has me thinking about that approach. I'm curious to know if a similar approach used for DALL-E has been tried for audio denoising: pre-train an encoder/decoder in a self-supervised fashion on a large dataset of audio train a denoiser to operate only in the latent space (ie the most compressed representation that is passed from the encoder to the decoder) step 1 - self-supervised training of encoder/decoder https://preview.redd.it/nprc40ob9gs81.png?width=1668&format=png&auto=webp&s=3a9b181ceff6c4530b5f41abf793dfb6409c0ec2 step 2 - train denoiser in latent space only ​ https://preview.redd.it/hreajdpe9gs81.png?width=1279&format=png&auto=webp&s=e16490fff00fe82487ca214b11b642ffcb30fb1c step 3 - do inference by feeding denoised latent space vector into the decoder https://preview.redd.it/7nox4ezh9gs81.png?width=2034&format=png&auto=webp&s=ccc69bbb2096d84a9e4a824000e62bb0f80fbe29 Is this a common approach already? It seems like once you have a good pretrained encoder/decoder pair then the denoiser training would be much more efficient than training an entire network that does everything at once from scratch (smaller search space, faster training loop) submitted by /u/The_Amp_Walrus [link] [comments]  ( 1 min )
    [Discussion] MLOps vs Platform Engineering
    Hey guys, I have the opportunity to either move to the platform engineering team or the freshly created MLOps team within my company. I'm interested in both careers, as I like to build Infra. I'm currently a Data Eng, and I find myself to like building apps and enabling applications to talk to each other, better than cleaning up data. I worked as a data scientist before, but I didn't like the science. I was always into engineering. What would make sense from a career perspective (both long and short term), i.e., money, promotions, attractiveness, etc. submitted by /u/dash2392 [link] [comments]  ( 1 min )
  • Open

    Distributed Reinforcement Learning for Robot Teams: A Review. (arXiv:2204.03516v1 [cs.RO])
    Purpose of review: Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation. Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the "centralized training, decentralized execution" paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning. Summary: This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.  ( 2 min )
    Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery. (arXiv:2106.02190v5 [cs.LG] UPDATED)
    We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems. (arXiv:2204.03132v1 [math.OC])
    We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs) in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rate of solution procedures have been studied in this setting, the analysis of iteration complexity is still in its infancy. Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method, respectively, with an accelerated mirror-prox algorithm as the inner loop. We provide nonasymptotic theoretical guarantees for these algorithms. More specifically, we establish the global convergence rate of our algorithms for solving (strongly) monotone NGNEPs and we provide iteration complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms.  ( 2 min )
    DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores. (arXiv:2204.03219v1 [eess.AS])
    Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.  ( 2 min )
    Unsupervised Image-to-Image Translation with Generative Prior. (arXiv:2204.03641v1 [cs.CV])
    Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. Despite the recent progress in image translation models, it remains challenging to build mappings between complex domains with drastic visual discrepancies. In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm. Our key insight is to leverage the generative prior from pre-trained class-conditional GANs (e.g., BigGAN) to learn rich content correspondences across various domains. We propose a novel coarse-to-fine scheme: we first distill the generative prior to capture a robust coarse-level content representation that can link objects at an abstract semantic level, based on which fine-level content features are adaptively learned for more accurate multi-level content correspondences. Extensive experiments demonstrate the superiority of our versatile framework over state-of-the-art methods in robust, high-quality and diversified translations, even for challenging and distant domains.  ( 2 min )
    SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. (arXiv:2204.03040v1 [cs.SD])
    In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task. Baseline results of state-of-the-art MOS prediction models on the SOMOS dataset are presented, while we show the challenges that such models face when assigned to evaluate synthetic utterances.  ( 2 min )
    Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection---Extended Version. (arXiv:2204.03341v1 [cs.LG])
    Time series data occurs widely, and outlier detection is a fundamental problem in data mining, which has numerous applications. Existing autoencoder-based approaches deliver state-of-the-art performance on challenging real-world data but are vulnerable to outliers and exhibit low explainability. To address these two limitations, we propose robust and explainable unsupervised autoencoder frameworks that decompose an input time series into a clean time series and an outlier time series using autoencoders. Improved explainability is achieved because clean time series are better explained with easy-to-understand patterns such as trends and periodicities. We provide insight into this by means of a post-hoc explainability analysis and empirical studies. In addition, since outliers are separated from clean time series iteratively, our approach offers improved robustness to outliers, which in turn improves accuracy. We evaluate our approach on five real-world datasets and report improvements over the state-of-the-art approaches in terms of robustness and explainability. This is an extended version of "Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection", to appear in IEEE ICDE 2022.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning. (arXiv:2107.06336v2 [cs.LG] UPDATED)
    The Shapley value (SV) and Least core (LC) are classic methods in cooperative game theory for cost/profit sharing problems. Both methods have recently been proposed as a principled solution for data valuation tasks, i.e., quantifying the contribution of individual datum in machine learning. However, both SV and LC suffer computational challenges due to the need for retraining models on combinatorially many data subsets. In this work, we propose to boost the efficiency in computing Shapley value or Least core by learning to estimate the performance of a learning algorithm on unseen data combinations. Theoretically, we derive bounds relating the error in the predicted learning performance to the approximation error in SV and LC. Empirically, we show that the proposed method can significantly improve the accuracy of SV and LC estimation.  ( 2 min )
    Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients. (arXiv:2204.03304v1 [cs.LG])
    Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.  ( 2 min )
    GraFN: Semi-Supervised Node Classification on Graph with Few Labels via Non-Parametric Distribution Assignment. (arXiv:2204.01303v2 [cs.LG] UPDATED)
    Despite the success of Graph Neural Networks (GNNs) on various applications, GNNs encounter significant performance degradation when the amount of supervision signals, i.e., number of labeled nodes, is limited, which is expected as GNNs are trained solely based on the supervision obtained from the labeled nodes. On the other hand,recent self-supervised learning paradigm aims to train GNNs by solving pretext tasks that do not require any labeled nodes, and it has shown to even outperform GNNs trained with few labeled nodes. However, a major drawback of self-supervised methods is that they fall short of learning class discriminative node representations since no labeled information is utilized during training. To this end, we propose a novel semi-supervised method for graphs, GraFN, that leverages few labeled nodes to ensure nodes that belong to the same class to be grouped together, thereby achieving the best of both worlds of semi-supervised and self-supervised methods. Specifically, GraFN randomly samples support nodes from labeled nodes and anchor nodes from the entire graph. Then, it minimizes the difference between two predicted class distributions that are non-parametrically assigned by anchor-supports similarity from two differently augmented graphs. We experimentally show that GraFN surpasses both the semi-supervised and self-supervised methods in terms of node classification on real-world graphs. The source code for GraFN is available at https://github.com/Junseok0207/GraFN.  ( 2 min )
    DynLight: Realize dynamic phase duration with multi-level traffic signal control. (arXiv:2204.03471v1 [cs.AI])
    Adopting reinforcement learning (RL) for traffic signal control is increasingly popular. Most RL methods use fixed action interval (denoted as tduration) and actuate or maintain a phase every tduration, which makes the phase duration less dynamic and flexible. In addition, the actuated phase can be arbitrary, affecting the real-world deployment, which requires a fixed cyclical phase structure. To address these challenges, we propose a multi-level traffic signal control framework, DynLight, which uses an optimization method Max-QueueLength (M-QL) to determine the phase and uses a deep Q-network to determine the corresponding duration. Based on DynLight, we further propose DynLight-C that adopts a well trained deep Q-network of DynLight and replace M-QL by a fixed cyclical control policy that actuate a set of phases in fixed order to realize cyclical phase structure. Comprehensive experiments on multiple real-world datasets demonstrate that DynLight achives a new state-of-the-art. Furthermore, the deep Q-network of DynLight can learn well on determining the phase duration and DynLight-C demonstrates high performance for deployment.  ( 2 min )
    Multiplayer Performative Prediction: Learning in Decision-Dependent Games. (arXiv:2201.03398v2 [cs.GT] UPDATED)
    Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but can be found efficiently only when the game is monotone. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )
    Automated question generation and question answering from Turkish texts. (arXiv:2111.06476v4 [cs.LG] UPDATED)
    While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using Turkish QA datasets. To the best of our knowledge, this is the first academic work that performs automated text-to-text question generation from Turkish texts. Experimental evaluations show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance on TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and the pre-trained models are available at https://github.com/obss/turkish-question-generation.
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    On Monte Carlo Tree Search for Weighted Vertex Coloring. (arXiv:2202.01665v2 [cs.LG] UPDATED)
    This work presents the first study of using the popular Monte Carlo Tree Search (MCTS) method combined with dedicated heuristics for solving the Weighted Vertex Coloring Problem. Starting with the basic MCTS algorithm, we gradually introduce a number of algorithmic variants where MCTS is extended by various simulation strategies including greedy and local search heuristics. We conduct experiments on well-known benchmark instances to assess the value of each studied combination. We also provide empirical evidence to shed light on the advantages and limits of each strategy.
    A survey on recently proposed activation functions for Deep Learning. (arXiv:2204.02921v2 [cs.LG] UPDATED)
    Artificial neural networks (ANN), typically referred to as neural networks, are a class of Machine Learning algorithms and have achieved widespread success, having been inspired by the biological structure of the human brain. Neural networks are inherently powerful due to their ability to learn complex function approximations from data. This generalization ability has been able to impact multidisciplinary areas involving image recognition, speech recognition, natural language processing, and others. Activation functions are a crucial sub-component of neural networks. They define the output of a node in the network given a set of inputs. This survey discusses the main concepts of activation functions in neural networks, including; a brief introduction to deep neural networks, a summary of what are activation functions and how they are used in neural networks, their most common properties, the different types of activation functions, some of the challenges, limitations, and alternative solutions faced by activation functions, concluding with the final remarks.
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.
    On the Effectiveness of Pretrained Models for API Learning. (arXiv:2204.03498v1 [cs.SE])
    Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by around 11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
    Graph Neural Network-based Android Malware Classification with Jumping Knowledge. (arXiv:2201.07537v5 [cs.CR] UPDATED)
    This paper presents a new Android malware detection method based on Graph Neural Networks (GNNs) with Jumping-Knowledge (JK). Android function call graphs (FCGs) consist of a set of program functions and their inter-procedural calls. Thus, this paper proposes a GNN-based method for Android malware detection by capturing meaningful intra-procedural call path patterns. In addition, a Jumping-Knowledge technique is applied to minimize the effect of the over-smoothing problem, which is common in GNNs. The proposed method has been extensively evaluated using two benchmark datasets. The results demonstrate the superiority of our approach compared to state-of-the-art approaches in terms of key classification metrics, which demonstrates the potential of GNNs in Android malware detection and classification.
    Federated Learning with Erroneous Communication Links. (arXiv:2201.12991v2 [cs.LG] UPDATED)
    In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $e$ and $1-e$, respectively. We provide mathematical proof for the convergence of the FL algorithm in the presence of communication errors, where the CN uses past local updates when the fresh updates are not received from some devices. We show via simulations that by using the past local updates, the FL algorithm can converge in the presence of communication errors. We also show that when the dataset is uniformly distributed among devices, the FL algorithm that only uses fresh updates and discards missing updates might converge faster than the FL algorithm that uses past local updates.
    An Exploration of Active Learning for Affective Digital Phenotyping. (arXiv:2204.01915v2 [cs.LG] UPDATED)
    Some of the most severe bottlenecks preventing widespread development of machine learning models for human behavior include a dearth of labeled training data and difficulty of acquiring high quality labels. Active learning is a paradigm for using algorithms to computationally select a useful subset of data points to label using metrics for model uncertainty and data similarity. We explore active learning for naturalistic computer vision emotion data, a particularly heterogeneous and complex data space due to inherently subjective labels. Using frames collected from gameplay acquired from a therapeutic smartphone game for children with autism, we run a simulation of active learning using gameplay prompts as metadata to aid in the active learning process. We find that active learning using information generated during gameplay slightly outperforms random selection of the same number of labeled frames. We next investigate a method to conduct active learning with subjective data, such as in affective computing, and where multiple crowdsourced labels can be acquired for each image. Using the Child Affective Facial Expression (CAFE) dataset, we simulate an active learning process for crowdsourcing many labels and find that prioritizing frames using the entropy of the crowdsourced label distribution results in lower categorical cross-entropy loss compared to random frame selection. Collectively, these results demonstrate pilot evaluations of two novel active learning approaches for subjective affective data collected in noisy settings.
    ECMG: Exemplar-based Commit Message Generation. (arXiv:2203.02700v2 [cs.SE] UPDATED)
    Commit messages concisely describe the content of code diffs (i.e., code changes) and the intent behind them. Recently, many approaches have been proposed to generate commit messages automatically. The information retrieval-based methods reuse the commit messages of similar code diffs, while the neural-based methods learn the semantic connection between code diffs and commit messages. However, the reused commit messages might not accurately describe the content/intent of code diffs and neural-based methods tend to generate high-frequent and repetitive tokens in the corpus. In this paper, we combine the advantages of the two technical routes and propose a novel exemplar-based neural commit message generation model, which treats the similar commit message as an exemplar and leverages it to guide the neural network model to generate an accurate commit message. We perform extensive experiments and the results confirm the effectiveness of our model.
    Causality, Causal Discovery, and Causal Inference in Structural Engineering. (arXiv:2204.01543v2 [cs.LG] UPDATED)
    Much of our experiments are designed to uncover the cause(s) and effect(s) behind a data generating mechanism (i.e., phenomenon) we happen to be interested in. Uncovering such relationships allows us to identify the true working of a phenomenon and, most importantly, articulate a model that may enable us to further explore the phenomenon on hand and/or allow us to predict it accurately. Fundamentally, such models are likely to be derived via a causal approach (as opposed to an observational or empirical mean). In this approach, causal discovery is required to create a causal model, which can then be applied to infer the influence of interventions, and answer any hypothetical questions (i.e., in the form of What ifs? Etc.) that we might have. This paper builds a case for causal discovery and causal inference and contrasts that against traditional machine learning approaches; all from a civil and structural engineering perspective. More specifically, this paper outlines the key principles of causality and the most commonly used algorithms and packages for causal discovery and causal inference. Finally, this paper also presents a series of examples and case studies of how causal concepts can be adopted for our domain.
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.
    Data-Centric Green AI: An Exploratory Empirical Study. (arXiv:2204.02766v2 [cs.LG] UPDATED)
    With the growing availability of large-scale datasets, and the popularization of affordable storage and computational capabilities, the energy consumed by AI is becoming a growing concern. To address this issue, in recent years, studies have focused on demonstrating how AI energy efficiency can be improved by tuning the model training strategy. Nevertheless, how modifications applied to datasets can impact the energy consumption of AI is still an open question. To fill this gap, in this exploratory study, we evaluate if data-centric approaches can be utilized to improve AI energy efficiency. To achieve our goal, we conduct an empirical experiment, executed by considering 6 different AI algorithms, a dataset comprising 5,574 data points, and two dataset modifications (number of data points and number of features). Our results show evidence that, by exclusively conducting modifications on datasets, energy consumption can be drastically reduced (up to 92.16%), often at the cost of a negligible or even absent accuracy decline. As additional introductory results, we demonstrate how, by exclusively changing the algorithm used, energy savings up to two orders of magnitude can be achieved. In conclusion, this exploratory investigation empirically demonstrates the importance of applying data-centric techniques to improve AI energy efficiency. Our results call for a research agenda that focuses on data-centric techniques, to further enable and democratize Green AI.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.
    Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings. (arXiv:2111.00185v2 [cs.LG] UPDATED)
    Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with $L_2$ integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.
    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos. (arXiv:2111.14465v2 [cs.CV] UPDATED)
    We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using differentiable rendering, we are able to estimate all parameters by minimizing the pixel-wise reprojection error to the input video via backpropagating through a rendering pipeline that accounts for motion blur by averaging the graphics output over short time intervals. For that purpose, we also estimate the camera exposure gap time within the same optimization. To account for abrupt motion changes like bounces, we model the motion trajectory as a piece-wise polynomial, and we are able to estimate the specific time of the bounce at sub-frame accuracy. Experiments on established benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
    Federated Learning of Generative Image Priors for MRI Reconstruction. (arXiv:2202.04175v2 [eess.IV] UPDATED)
    Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data. Federated learning (FL) has recently been introduced to address privacy concerns by enabling distributed training without transfer of imaging data. Existing FL methods for MRI reconstruction employ conditional models to map from undersampled to fully-sampled acquisitions via explicit knowledge of the imaging operator. Since conditional models generalize poorly across different acceleration rates or sampling densities, imaging operators must be fixed between training and testing, and they are typically matched across sites. To improve generalization and flexibility in multi-institutional collaborations, here we introduce a novel method for MRI reconstruction based on Federated learning of Generative IMage Priors (FedGIMP). FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and subject-specific injection of the imaging operator. The global MRI prior is learned via an unconditional adversarial model that synthesizes high-quality MR images based on latent variables. Specificity in the prior is preserved via a mapper subnetwork that produces site-specific latents. During inference, the prior is combined with subject-specific imaging operators to enable reconstruction, and further adapted to individual test samples by minimizing data-consistency loss. Comprehensive experiments on multi-institutional datasets clearly demonstrate enhanced generalization performance of FedGIMP against site-specific and federated methods based on conditional models, as well as traditional reconstruction methods.
    Machine Learning based Medical Image Deepfake Detection: A Comparative Study. (arXiv:2109.12800v2 [cs.CV] UPDATED)
    Deep generative networks in recent years have reinforced the need for caution while consuming various modalities of digital information. One avenue of deepfake creation is aligned with injection and removal of tumors from medical scans. Failure to detect medical deepfakes can lead to large setbacks on hospital resources or even loss of life. This paper attempts to address the detection of such attacks with a structured case study. Specifically, we evaluate eight different machine learning algorithms, which including three conventional machine learning methods, support vector machine, random forest, decision tree, and five deep learning models, DenseNet121, DenseNet201, ResNet50, ResNet101, VGG19, on distinguishing between tampered and untampered images.For deep learning models, the five models are used for feature extraction, then fine-tune for each pre-trained model is performed. The findings of this work show near perfect accuracy in detecting instances of tumor injections and removals.
    Reinforcement Learning with Almost Sure Constraints. (arXiv:2112.05198v2 [cs.LG] UPDATED)
    In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the agent must navigate. Simulations illustrate the different nature of probability one constraints against the typically used constraints in expectation.
    Quantum Distributed Deep Learning Architectures: Models, Discussions, and Applications. (arXiv:2202.11200v3 [quant-ph] UPDATED)
    Although deep learning (DL) has already become a state-of-the-art technology for various data processing tasks, data security and computational overload problems often arise due to their high data and computational power dependency. To solve this problem, quantum deep learning (QDL) and distributed deep learning (DDL) has emerged to complement existing DL methods. Furthermore, a quantum distributed deep learning (QDDL) technique that combines and maximizes these advantages is getting attention. This paper compares several model structures for QDDL and discusses their possibilities and limitations to leverage QDDL for some representative application scenarios.
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.
    SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging. (arXiv:2107.02375v4 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base networks and generalized to various types of medical imaging tasks.
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.
    Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. (arXiv:2111.14826v2 [cs.CV] UPDATED)
    The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5~1.7 on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization.
    Margin Calibration for Long-Tailed Visual Recognition. (arXiv:2112.07225v2 [cs.CV] UPDATED)
    The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.
    Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. (arXiv:2202.06934v3 [cs.CV] UPDATED)
    Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .
    Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation. (arXiv:2203.05774v2 [eess.SY] UPDATED)
    In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.
    Membership Inference Attacks Against Self-supervised Speech Models. (arXiv:2111.05113v2 [cs.CR] UPDATED)
    Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.
    Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning. (arXiv:2112.07535v2 [cs.LG] UPDATED)
    The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50\% fewer state measurements, and recurrent neural networks can produce a greater than 50\% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.
    Control Theoretic Analysis of Temporal Difference Learning. (arXiv:2112.14417v4 [cs.AI] UPDATED)
    The goal of this paper is to investigate a control theoretic analysis of linear stochastic iterative algorithm and temporal difference (TD) learning. TD-learning is a linear stochastic iterative algorithm to estimate the value function of a given policy for a Markov decision process, which is one of the most popular and fundamental reinforcement learning algorithms. While there has been a series of successful works in theoretical analysis of TD-learning, it was not until recently that researchers found some guarantees on its statistical efficiency. In this paper, we propose a control theoretic finite-time analysis TD-learning, which exploits standard notions in linear system control communities. Therefore, the proposed work provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.
    Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning. (arXiv:2204.03597v1 [cs.LG])
    The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.
    Discriminability-enforcing loss to improve representation learning. (arXiv:2202.07073v2 [cs.CV] UPDATED)
    During the training process, deep neural networks implicitly learn to represent the input data samples through a hierarchy of features, where the size of the hierarchy is determined by the number of layers. In this paper, we focus on enforcing the discriminative power of the high-level representations, that are typically learned by the deeper layers (closer to the output). To this end, we introduce a new loss term inspired by the Gini impurity, which is aimed at minimizing the entropy (increasing the discriminative power) of individual high-level features with respect to the class labels. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. As such, we introduce another loss term to minimize the Kullback-Leibler divergence between the two distributions. We conduct experiments on two image classification data sets (CIFAR-100 and Caltech 101), considering multiple neural architectures ranging from convolutional networks (ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone, without increasing the inference time at all.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    Deep learning method for identifying mass composition of ultra-high-energy cosmic rays. (arXiv:2112.02072v2 [astro-ph.IM] UPDATED)
    We introduce a novel method for identifying the mass composition of ultra-high-energy cosmic rays using deep learning. The key idea of the method is to use a chain of two neural networks. The first network predicts the type of a primary particle for individual events, while the second infers the mass composition of an ensemble of events. We apply this method to the Monte-Carlo data for the Telescope Array Surface Detectors readings, on which it yields an unprecedented low error of 7% for 4-component approximation. We also discuss the problems of applying the developed method to the experimental data, and the way they can be resolved.
    Learning and Transferring Value Function for Robot Exploration in Subterranean Environments. (arXiv:2204.03140v1 [cs.RO])
    In traditional robot exploration methods, the robot usually does not have prior biases about the environment it is exploring. Thus the robot assigns equal importance to the goals which leads to insufficient exploration efficiency. Alternative, often a hand-tuned policy is used to tweak the value of goals. In this paper, we present a method to learn how "good" some states are, measured by the state value function, to provide a hint for the robot to make exploration decisions. We propose to learn state value functions from previous offline collected datasets and then transfer and improve the value function during testing in a new environment. Moreover, the environments usually have very few and even no extrinsic reward or feedback for the robot. Therefore in this work, we also tackle the problem of sparse extrinsic rewards from the environments. We design several intrinsic rewards to encourage the robot to obtain more information during exploration. These reward functions then become the building blocks of the state value functions. We test our method on challenging subterranean and urban environments. To the best of our knowledge, this work for the first time demonstrates value function prediction with previous collected datasets to help exploration in challenging subterranean environments.
    Security Aspects of Quantum Machine Learning: Opportunities, Threats and Defenses. (arXiv:2204.03625v1 [cs.CR])
    In the last few years, quantum computing has experienced a growth spurt. One exciting avenue of quantum computing is quantum machine learning (QML) which can exploit the high dimensional Hilbert space to learn richer representations from limited data and thus can efficiently solve complex learning tasks. Despite the increased interest in QML, there have not been many studies that discuss the security aspects of QML. In this work, we explored the possible future applications of QML in the hardware security domain. We also expose the security vulnerabilities of QML and emerging attack models, and corresponding countermeasures.
    Heterogeneous Target Speech Separation. (arXiv:2204.03594v1 [cs.SD])
    We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.
    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.
    DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search. (arXiv:2011.02166v2 [cs.CV] UPDATED)
    The convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead against efficient deployment. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure, such that the pruned network can be easily deployed in practice. However, existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space. In this paper, we introduce Differentiable Annealing Indicator Search (DAIS) that leverages the strength of neural architecture search in the channel pruning and automatically searches for the effective pruned model with given constraints on computation overhead. Specifically, DAIS relaxes the binarized channel indicators to be continuous and then jointly learns both indicators and model parameters via bi-level optimization. To bridge the non-negligible discrepancy between the continuous model and the target binarized model, DAIS proposes an annealing-based procedure to steer the indicator convergence towards binarized states. Moreover, DAIS designs various regularizations based on a priori structural knowledge to control the pruning sparsity and to improve model performance. Experimental results show that DAIS outperforms state-of-the-art pruning methods on CIFAR-10, CIFAR-100, and ImageNet.
    Variational Autoencoder based Metamodeling for Multi-Objective Topology Optimization of Electrical Machines. (arXiv:2201.08877v2 [cs.LG] UPDATED)
    Conventional magneto-static finite element analysis of electrical machine design is time-consuming and computationally expensive. Since each machine topology has a distinct set of parameters, design optimization is commonly performed independently. This paper presents a novel method for predicting Key Performance Indicators (KPIs) of differently parameterized electrical machine topologies at the same time by mapping a high dimensional integrated design parameters in a lower dimensional latent space using a variational autoencoder. After training, via a latent space, the decoder and multi-layer neural network will function as meta-models for sampling new designs and predicting associated KPIs, respectively. This enables parameter-based concurrent multi-topology optimization.
    End-To-End Optimization of Online Neural Network-supported Two-Stage Dereverberation for Hearing Devices. (arXiv:2204.02978v1 [eess.AS])
    A two-stage online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filtering approach with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs). This contribution extends our prior work, which shows that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation, as compared to placing the criterion at the output of the DNN to optimize the PSD estimation. In the present work, we show that the dereverberation performance of the proposed first stage particularly improves the early-to-mid reverberation ratio if trained end-to-end. We thus argue that it can be combined with a post-filtering stage which benefits from the early-to-mid ratio improvement and is consequently able to efficiently suppress the residual late reverberation. This proposed two stage procedure is shown to be both very effective in terms of dereverberation performance and computational demands. Furthermore, the proposed system can be adapted to the needs of different types of hearing-device users by controlling the amount of reduction of early reflections. The proposed system outperforms the previously proposed end-to-end DNN-supported linear filtering algorithm, as well as other traditional approaches, based on an evaluation using the noise-free version of the WHAMR! dataset.
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.
    Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems. (arXiv:2203.06416v2 [cs.AI] UPDATED)
    When dealing with a series of imminent issues, humans can naturally concentrate on a subset of these concerning issues by prioritizing them according to their contributions to motivational indices, e.g., the probability of winning a game. This idea of concentration offers insights into reinforcement learning of sophisticated Large-scale Multi-Agent Systems (LMAS) participated by hundreds of agents. In such an LMAS, each agent receives a long series of entity observations at each step, which can overwhelm existing aggregation networks such as graph attention networks and cause inefficiency. In this paper, we propose a concentration network called ConcNet. First, ConcNet scores the observed entities considering several motivational indices, e.g., expected survival time and state value of the agents, and then ranks, prunes, and aggregates the encodings of observed entities to extract features. Second, distinct from the well-known attention mechanism, ConcNet has a unique motivational subnetwork to explicitly consider the motivational indices when scoring the observed entities. Furthermore, we present a concentration policy gradient architecture that can learn effective policies in LMAS from scratch. Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.  ( 2 min )
    On the Limitations of Multimodal VAEs. (arXiv:2110.04121v2 [cs.LG] UPDATED)
    Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
    Towards Improving Selective Prediction Ability of NLP Systems. (arXiv:2008.09371v3 [cs.CL] UPDATED)
    It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over 'MaxProb' -- a selective prediction baseline -- on NLI and DD tasks respectively.
    Multi-Sample $\zeta$-mixup: Richer, More Realistic Synthetic Samples from a $p$-Series Interpolant. (arXiv:2204.03323v1 [cs.LG])
    Modern deep learning training procedures rely on model regularization techniques such as data augmentation methods, which generate training samples that increase the diversity of data and richness of label information. A popular recent method, mixup, uses convex combinations of pairs of original samples to generate new samples. However, as we show in our experiments, mixup can produce undesirable synthetic samples, where the data is sampled off the manifold and can contain incorrect labels. We propose $\zeta$-mixup, a generalization of mixup with provably and demonstrably desirable properties that allows convex combinations of $N \geq 2$ samples, leading to more realistic and diverse outputs that incorporate information from $N$ original samples by using a $p$-series interpolant. We show that, compared to mixup, $\zeta$-mixup better preserves the intrinsic dimensionality of the original datasets, which is a desirable property for training generalizable models. Furthermore, we show that our implementation of $\zeta$-mixup is faster than mixup, and extensive evaluation on controlled synthetic and 24 real-world natural and medical image classification datasets shows that $\zeta$-mixup outperforms mixup and traditional data augmentation techniques.
    Inference over radiative transfer models using variational and expectation maximization methods. (arXiv:2204.03346v1 [cs.LG])
    Earth observation from satellites offers the possibility to monitor our planet with unprecedented accuracy. Radiative transfer models (RTMs) encode the energy transfer through the atmosphere, and are used to model and understand the Earth system, as well as to estimate the parameters that describe the status of the Earth from satellite observations by inverse modeling. However, performing inference over such simulators is a challenging problem. RTMs are nonlinear, non-differentiable and computationally costly codes, which adds a high level of difficulty in inference. In this paper, we introduce two computational techniques to infer not only point estimates of biophysical parameters but also their joint distribution. One of them is based on a variational autoencoder approach and the second one is based on a Monte Carlo Expectation Maximization (MCEM) scheme. We compare and discuss benefits and drawbacks of each approach. We also provide numerical comparisons in synthetic simulations and the real PROSAIL model, a popular RTM that combines land vegetation leaf and canopy modeling. We analyze the performance of the two approaches for modeling and inferring the distribution of three key biophysical parameters for quantifying the terrestrial biosphere.
    Data Justice Stories: A Repository of Case Studies. (arXiv:2204.03100v1 [cs.CY])
    The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of confronting the root causes of injustice.However, despite the seeming novelty of such a data justice pedigree, this joining up of the critique of the power imbalances that have shaped the digital and "big data" revolutions with a commitment to social equity and constructive societal transformation has a deeper historical, and more geographically diverse, provenance. As the stories of the data justice initiatives, activism, and advocacy contained in this volume well evidence, practices of data justice across the globe have, in fact, largely preceded the elaboration and crystallisation of the idea of data justice in contemporary academic discourse. In telling these data justice stories, we hope to provide the reader with two interdependent tools of data justice thinking: First, we aim to provide the reader with the critical leverage needed to discern those distortions and malformations of data justice that manifest in subtle and explicit forms of power, domination, and coercion. Second, we aim to provide the reader with access to the historically effective forms of normativity and ethical insight that have been marshalled by data justice activists and advocates as tools of societal transformation-so that these forms of normativity and insight can be drawn on, in turn, as constructive resources to spur future transformative data justice practices.
    Correcting Misproducted Speech using Spectrogram Inpainting. (arXiv:2204.03379v1 [eess.AS])
    Learning a new language involves constantly comparing speech productions with reference productions from the environment. Early in speech acquisition, children make articulatory adjustments to match their caregivers' speech. Grownup learners of a language tweak their speech to match the tutor reference. This paper proposes a method to synthetically generate correct pronunciation feedback given incorrect production. Furthermore, our aim is to generate the corrected production while maintaining the speaker's original voice. The system prompts the user to pronounce a phrase. The speech is recorded, and the samples associated with the inaccurate phoneme are masked with zeros. This waveform serves as an input to a speech generator, implemented as a deep learning inpainting system with a U-net architecture, and trained to output a reconstructed speech. The training set is composed of unimpaired proper speech examples, and the generator is trained to reconstruct the original proper speech. We evaluated the performance of our system on phoneme replacement of minimal pair words of English as well as on children with pronunciation disorders. Results suggest that human listeners slightly prefer our generated speech over a smoothed replacement of the inaccurate phoneme with a production of a different speaker.
    Accelerating Attention through Gradient-Based Learned Runtime Pruning. (arXiv:2204.03227v1 [cs.CL])
    Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural Language Processing models. This attention mechanism calculates a correlation score for each word with respect to the other words in a sentence. Commonly, only a small subset of words highly correlates with the word under attention, which is only determined at runtime. As such, a significant amount of computation is inconsequential due to low attention scores and can potentially be pruned. The main challenge is finding the threshold for the scores below which subsequent computation will be inconsequential. Although such a threshold is discrete, this paper formulates its search through a soft differentiable regularizer integrated into the loss function of the training. This formulation piggy backs on the back-propagation training to analytically co-optimize the threshold and the weights simultaneously, striking a formally optimal balance between accuracy and computation pruning. To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism. We evaluate our design across 43 back-end tasks for MemN2N, BERT, ALBERT, GPT-2, and Vision transformer models. Post-layout results show that, on average, LeOPArd yields 1.9x and 3.9x speedup and energy reduction, respectively, while keeping the average accuracy virtually intact (<0.2% degradation)
    Pin the Memory: Learning to Generalize Semantic Segmentation. (arXiv:2204.03609v1 [cs.CV])
    The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework. Especially, our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains. Upon the meta-learning concept, we repeatedly train memory-guided networks and simulate virtual test to 1) learn how to memorize a domain-agnostic and distinct information of classes and 2) offer an externally settled memory as a class-guidance to reduce the ambiguity of representation in the test data of arbitrary unseen domain. To this end, we also propose memory divergence and feature cohesion losses, which encourage to learn memory reading and update processes for category-aware domain generalization. Extensive experiments for semantic segmentation demonstrate the superior generalization capability of our method over state-of-the-art works on various benchmarks.
    Class-Incremental Learning with Strong Pre-trained Models. (arXiv:2204.03634v1 [cs.CV])
    Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation -- cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings.
    GNNLens: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks. (arXiv:2011.11048v6 [cs.HC] UPDATED)
    Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis tool, GNNLens, to assist model developers and users in understanding and analyzing GNNs. Specifically, Parallel Sets View and Projection View enable users to quickly identify and validate error patterns in the set of wrong predictions; Graph View and Feature Matrix View offer a detailed analysis of individual nodes to assist users in forming hypotheses about the error patterns. Since GNNs jointly model the graph structure and the node features, we reveal the relative influences of the two types of information by comparing the predictions of three models: GNN, Multi-Layer Perceptron (MLP), and GNN Without Using Features (GNNWUF). Two case studies and interviews with domain experts demonstrate the effectiveness of GNNLens in facilitating the understanding of GNN models and their errors.
    Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection. (arXiv:2010.05454v3 [cs.LG] UPDATED)
    Feature selection is an important data preprocessing in data mining and machine learning which can be used to reduce the feature dimension without deteriorating model's performance. Since obtaining annotated data is laborious or even infeasible in many cases, unsupervised feature selection is more practical in reality. Though lots of methods for unsupervised feature selection have been proposed, these methods select features independently, thus it is no guarantee that the group of selected features is optimal. What's more, the number of selected features must be tuned carefully to obtain a satisfactory result. To tackle these problems, we propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method in this paper, in which a $l_{2,0}$-norm regularization term with respect to transformation matrix is imposed in the manifold learning for feature selection, and a graph regularization term is incorporated into the learning model to learn the local geometric structure of data adaptively. An efficient and simple iterative algorithm is designed to solve the proposed optimization problem with the analysis of computational complexity. After optimized, a subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method compared with several state-of-the-art approaches.
    Fast Design Space Exploration of Nonlinear Systems: Part I. (arXiv:2104.01747v7 [cs.LG] UPDATED)
    System design tools are often only available as input-output blackboxes: for a given design as input they compute an output representing system behavior. Blackboxes are intended to be run in the forward direction. This paper presents a new method of solving the inverse design problem namely, given requirements or constraints on output, find an input that also optimizes an objective function. This problem is challenging for several reasons. First, blackboxes are not designed to be run in reverse. Second, inputs and outputs can be discrete and continuous. Third, finding designs concurrently satisfying a set of requirements is hard because designs satisfying individual requirements may conflict with each other. Fourth, blackbox evaluations can be expensive. Finally, blackboxes can sometimes fail to produce an output. This paper presents CNMA, a new method of solving the inverse problem that overcomes these challenges. CNMA tries to sample only the part of the design space relevant to solving the problem, leveraging the power of neural networks, Mixed Integer Linear Programs, and a new learning-from-failure feedback loop. The paper also presents a parallel version of CNMA that improves the efficiency and quality of solutions over the sequential version, and tries to steer it away from local optima. CNMA's performance is evaluated against conventional optimization methods for seven nonlinear design problems of 8 (two problems), 10, 15, 36 and 60 real-valued dimensions and one with 186 binary dimensions. Conventional methods evaluated are off-the-shelf implementations of Bayesian Optimization with Gaussian Processes, Nelder Mead and Random Search. The first two do not solve problems that are high-dimensional, have discrete and continuous variables or whose blackboxes can fail to return values. CNMA solves all problems, and surpasses the performance of conventional methods by up to 87%.
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.
    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids. (arXiv:2204.03305v1 [eess.AS])
    Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.
    Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews. (arXiv:2104.05861v3 [cs.SE] UPDATED)
    Context: Mobile app reviews written by users on app stores or social media are significant resources for app developers.Analyzing app reviews have proved to be useful for many areas of software engineering (e.g., requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. identifying bugs versus usability issues or sentiment), new datasets should be labeled, which prevents the extensibility of the developed models for new desired classes/tasks in practice. Recent pre-trained neural language models (PTM) are trained on large corpora in an unsupervised manner and have found success in solving similar Natural Language Processing problems. However, the applicability of PTMs is not explored for app review classification Objective: We investigate the benefits of PTMs for app review classification compared to the existing models, as well as the transferability of PTMs in multiple settings. Method: We empirically study the accuracy and time efficiency of PTMs compared to prior approaches using six datasets from literature. In addition, we investigate the performance of the PTMs trained on app reviews (i.e. domain-specific PTMs) . We set up different studies to evaluate PTMs in multiple settings: binary vs. multi-class classification, zero-shot classification (when new labels are introduced to the model), multi-task setting, and classification of reviews from different resources. The datasets are manually labeled app review datasets from Google Play Store, Apple App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used and we will report the time required for training and prediction with the models.
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
    Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention. (arXiv:2204.03479v1 [cs.CL])
    Multi-head self-attention forms the core of Transformer networks. However, their quadratically growing complexity with respect to the input sequence length impedes their deployment on resource-constrained edge devices. We address this challenge by proposing a dynamic pruning method, which exploits the temporal stability of data across tokens to reduce inference cost. The threshold-based method only retains significant differences between the subsequent tokens, effectively reducing the number of multiply-accumulates, as well as the internal tensor data sizes. The approach is evaluated on the Google Speech Commands Dataset for keyword spotting, and the performance is compared against the baseline Keyword Transformer. Our experiments show that we can reduce ~80% of operations while maintaining the original 98.4% accuracy. Moreover, a reduction of ~87-94% operations can be achieved when only degrading the accuracy by 1-4%, speeding up the multi-head self-attention inference by a factor of ~7.5-16.
    Contrastive Learning Inverts the Data Generating Process. (arXiv:2102.08850v4 [cs.LG] UPDATED)
    Contrastive learning has recently seen tremendous success in self-supervised learning. So far, however, it is largely unclear why the learned representations generalize so effectively to a large variety of downstream tasks. We here prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data. While the proofs make certain statistical assumptions about the generative model, we observe empirically that our findings hold even if these assumptions are severely violated. Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis, thereby furthering our understanding of the learned representations as well as providing a theoretical foundation to derive more effective contrastive losses.
    Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes. (arXiv:2102.00135v6 [cs.LG] UPDATED)
    We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall highly nonconvex problems we show that the PMD methods exhibit fast linear rate of convergence to the global optimality. We develop stochastic counterparts of these methods, and establish an ${\cal O}(1/\epsilon)$ (resp., ${\cal O}(1/\epsilon^2)$) sampling complexity for solving these RL problems with strongly (resp., general) convex regularizers using different sampling schemes, where $\epsilon$ denote the target accuracy. We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log (1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon ) (L/\epsilon)^{1/2}\}$)for problems with strongly (resp., general) convex regularizers. Here $\gamma$ denotes the discounting factor. To the best of our knowledge, these complexity bounds, along with our algorithmic developments, appear to be new in both optimization and RL literature. The introduction of these convex regularizers also greatly expands the flexibility and applicability of RL models.
    Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers. (arXiv:2204.03503v1 [cs.CL])
    Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
    An Overview on Artificial Intelligence Techniques for Diagnosis of Schizophrenia Based on Magnetic Resonance Imaging Modalities: Methods, Challenges, and Future Works. (arXiv:2103.03081v2 [cs.LG] UPDATED)
    Schizophrenia (SZ) is a mental disorder that typically emerges in late adolescence or early adulthood. It reduces the life expectancy of patients by 15 years. Abnormal behavior, perception of emotions, social relationships, and reality perception are among its most significant symptoms. Past studies have revealed the temporal and anterior lobes of hippocampus regions of brain get affected by SZ. Also, increased volume of cerebrospinal fluid (CSF) and decreased volume of white and gray matter can be observed due to this disease. The magnetic resonance imaging (MRI) is the popular neuroimaging technique used to explore structural/functional brain abnormalities in SZ disorder owing to its high spatial resolution. Various artificial intelligence (AI) techniques have been employed with advanced image/signal processing methods to obtain accurate diagnosis of SZ. This paper presents a comprehensive overview of studies conducted on automated diagnosis of SZ using MRI modalities. Main findings, various challenges, and future works in developing the automated SZ detection are described in this paper.
    Unified Contrastive Learning in Image-Text-Label Space. (arXiv:2204.03610v1 [cs.CV])
    Visual recognition is recently learned via either supervised learning on human-annotated image-label data or language-image contrastive learning with webly-crawled image-text pairs. While supervised learning may result in a more discriminative representation, language-image pretraining shows unprecedented zero-shot recognition capability, largely due to the different properties of data sources and learning objectives. In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. In this space, we propose a new learning paradigm, called Unified Contrastive Learning (UniCL) with a single learning objective to seamlessly prompt the synergy of two data types. Extensive experiments show that our UniCL is an effective way of learning semantically rich yet discriminative representations, universally for image recognition in zero-shot, linear-probe, fully finetuning and transfer learning scenarios. Particularly, it attains gains up to 9.2% and 14.5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively. In linear probe setting, it also boosts the performance over the two methods by 7.3% and 3.4%, respectively. Our study also indicates that UniCL stand-alone is a good learner on pure image-label data, rivaling the supervised learning methods across three image classification datasets and two types of vision backbones, ResNet and Swin Transformer. Code is available at https://github.com/microsoft/UniCL.
    A Pathology-Based Machine Learning Method to Assist in Epithelial Dysplasia Diagnosis. (arXiv:2204.03572v1 [eess.IV])
    The Epithelial Dysplasia (ED) is a tissue alteration commonly present in lesions preceding oral cancer, being its presence one of the most important factors in the progression toward carcinoma. This study proposes a method to design a low computational cost classification system to support the detection of dysplastic epithelia, contributing to reduce the variability of pathologist assessments. We employ a multilayer artificial neural network (MLP-ANN) and defining the regions of the epithelium to be assessed based on the knowledge of the pathologist. The performance of the proposed solution was statistically evaluated. The implemented MLP-ANN presented an average accuracy of 87%, with a variability much inferior to that obtained from three trained evaluators. Moreover, the proposed solution led to results which are very close to those obtained using a convolutional neural network (CNN) implemented by transfer learning, with 100 times less computational complexity. In conclusion, our results show that a simple neural network structure can lead to a performance equivalent to that of much more complex structures, which are routinely used in the literature.
    Equivariance Discovery by Learned Parameter-Sharing. (arXiv:2204.03640v1 [cs.LG])
    Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. We propose to use the partition distance to empirically quantify the accuracy of the recovered equivariance. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme. Empirically, we show that the approach recovers known equivariances, such as permutations and shifts, on sum of numbers and spatially-invariant data.
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.
    Interval Bound Propagation--aided Few-shot Learning. (arXiv:2204.03511v1 [cs.LG])
    Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks, from a given task distribution, to generalize to unseen tasks, from the same distribution, with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. One way to encourage this is to preserve local neighborhoods in the feature space learned by the few-shot learner. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We further introduce a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds, to aid in cases with a scarcity of tasks. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains in comparison to a sizable number of recent competitors.
    An optimized hybrid solution for IoT based lifestyle disease classification using stress data. (arXiv:2204.03573v1 [eess.SP])
    Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent research advances have provided numerous opportunities to automate it. The fundamental features of most of these techniques are electro dermal activity (EDA) and heart rate values (HRV). We utilized an accelerometer to measure body motions to solve this challenge. The proposed novel method employs a test that measures a subject's electrocardiogram (ECG), galvanic skin values (GSV), HRV values, and body movements in order to provide a low-cost and time-saving solution for detecting stress lifestyle disease in modern times using cyber physical systems. This study provides a new hybrid model for lifestyle disease classification that decreases execution time while picking the best collection of characteristics and increases classification accuracy. The developed approach is capable of dealing with the class imbalance problem by using WESAD (wearable stress and affect dataset) dataset. The new model uses the Grid search (GS) method to select an optimized set of hyper parameters, and it uses a combination of the Correlation coefficient based Recursive feature elimination (CoC-RFE) method for optimal feature selection and gradient boosting as an estimator to classify the dataset, which achieves high accuracy and helps to provide smart, accurate, and high-quality healthcare systems. To demonstrate the validity and utility of the proposed methodology, its performance is compared to those of other well-established machine learning models.
    RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems. (arXiv:2011.07401v2 [cs.PF] UPDATED)
    With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.
    Position-based Prompting for Health Outcome Generation. (arXiv:2204.03489v1 [cs.CL])
    Probing Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomena has been effective especially when these LMs are fine-tuned towards not just data of a specific domain, but also to the style or linguistic pattern of the prompts themselves. We observe that, satisfying a particular linguistic pattern in prompts is an unsustainable constraint that unnecessarily lengthens the probing task, especially because, they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting objective and domain. We therefore explore an idea of using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers to rare prompt templates (in a case study on health outcome generation) such as Postfix and Mixed patterns whose missing information is respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default mask language model (MLM) representation is used to predict masked tokens.
    Modeling Label Correlations for Second-Order Semantic Dependency Parsing with Mean-Field Inference. (arXiv:2204.03619v1 [cs.CL])
    Second-order semantic parsing with end-to-end mean-field inference has been shown good performance. In this work we aim to improve this method by modeling label correlations between adjacent arcs. However, direct modeling leads to memory explosion because second-order score tensors have sizes of $O(n^3L^2)$ ($n$ is the sentence length and $L$ is the number of labels), which is not affordable. To tackle this computational challenge, we leverage tensor decomposition techniques, and interestingly, we show that the large second-order score tensors have no need to be materialized during mean-field inference, thereby reducing the computational complexity from cubic to quadratic. We conduct experiments on SemEval 2015 Task 18 English datasets, showing the effectiveness of modeling label correlations. Our code is publicly available at https://github.com/sustcsonglin/mean-field-dep-parsing.
    FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement. (arXiv:2204.03174v1 [cs.LG])
    As an emerging technology, federated learning (FL) involves training machine learning models over distributed edge devices, which attracts sustained attention and has been extensively studied. However, the heterogeneity of client data severely degrades the performance of FL compared with that in centralized training. It causes the locally trained models of clients to move in different directions. On the one hand, it slows down or even stalls the global updates, leading to inefficient communication. On the other hand, it enlarges the distances between local models, resulting in an aggregated global model with poor performance. Fortunately, these shortcomings can be mitigated by reducing the angle between the directions that local models move in. Based on this fact, we propose FedCos, which reduces the directional inconsistency of local models by introducing a cosine-similarity penalty. It promotes the local model iterations towards an auxiliary global direction. Moreover, our approach is auto-adapt to various non-IID settings without an elaborate selection of hyperparameters. The experimental results show that FedCos outperforms the well-known baselines and can enhance them under a variety of FL scenes, including varying degrees of data heterogeneity, different number of participants, and cross-silo and cross-device settings. Besides, FedCos improves communication efficiency by 2 to 5 times. With the help of FedCos, multiple FL methods require significantly fewer communication rounds than before to obtain a model with comparable performance.
    Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning. (arXiv:2104.08676v2 [cs.CL] UPDATED)
    We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference. We show that by applying additional distribution estimation methods, namely, Monte Carlo (MC) Dropout, Deep Ensemble, Re-Calibration, and Distribution Distillation, models can capture human judgement distribution more effectively than the softmax baseline. We show that MC Dropout is able to achieve decent performance without any distribution annotations while Re-Calibration can give further improvements with extra distribution annotations, suggesting the value of multiple annotations for one example in modeling the distribution of human judgements. Despite these improvements, the best results are still far below the estimated human upper-bound, indicating that predicting the distribution of human judgements is still an open, challenging problem with a large room for improvements. We showcase the common errors for MC Dropout and Re-Calibration. Finally, we give guidelines on the usage of these methods with different levels of data availability and encourage future work on modeling the human opinion distribution for language reasoning. Our code and data are publicly available at https://github.com/easonnie/ChaosNLI
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.
    Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand. (arXiv:2204.03570v1 [cs.CY])
    As the demand for mobility in our society seems to increase, the various issues centered on urban mobility are among those that worry most city inhabitants in this planet. For instance, how to go from A to B in an efficient (but also less stressful) way? These questions and concerns have not changed even during the covid-19 pandemic; on the contrary, as the current stand, people who are avoiding public transportation are only contributing to an increase in the vehicular traffic. The are of intelligent transportation systems (ITS) aims at investigating how to employ information and communication technologies to problems related to transportation. This may mean monitoring and managing the infrastructure (e.g., traffic roads, traffic signals, etc.). However, currently, ITS is also targeting the management of demand. In this panorama, artificial intelligence plays an important role, especially with the advances in machine learning that translates in the use of computational vision, connected and autonomous vehicles, agent-based simulation, among others. In the present work, a survey of several works developed by our group are discussed in a holistic perspective, i.e., they cover not only the supply side (as commonly found in ITS works), but also the demand side, and, in an novel perspective, the integration of both.
    Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data. (arXiv:2204.03216v1 [cs.LG])
    High-dimensional spatio-temporal dynamics can often be encoded in a low-dimensional subspace. Engineering applications for modeling, characterization, design, and control of such large-scale systems often rely on dimensionality reduction to make solutions computationally tractable in real-time. Common existing paradigms for dimensionality reduction include linear methods, such as the singular value decomposition (SVD), and nonlinear methods, such as variants of convolutional autoencoders (CAE). However, these encoding techniques lack the ability to efficiently represent the complexity associated with spatio-temporal data, which often requires variable geometry, non-uniform grid resolution, adaptive meshing, and/or parametric dependencies.To resolve these practical engineering challenges, we propose a general framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF consists of two modified multilayer perceptrons (MLPs): (i) ShapeNet, which isolates and represents the spatial complexity, and (ii) ParameterNet, which accounts for any other input complexity, including parametric dependencies, time, and sensor measurements. We demonstrate the utility of NIF for parametric surrogate modeling, enabling the interpretable representation and compression of complex spatio-temporal dynamics, efficient many-spatial-query tasks, and improved generalization performance for sparse reconstruction.
    Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results. (arXiv:2204.03475v1 [cs.CV])
    ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet
    Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping. (arXiv:2204.03487v1 [cs.LG])
    We investigate the "Visual Pushing for Grasping" (VPG) system by Zeng et al. and the "Hourglass" system by Ewerton et al., an evolution of the former. The focus of our work is the investigation of the capabilities of both systems to learn long-term rewards and policies. Zeng et al. original task only needs a limited amount of foresight. Ewerton et al. attain their best performance using an agent which only takes the most immediate action under consideration. We are interested in the ability of their models and training algorithms to accurately predict long-term Q-Values. To evaluate this ability, we design a new bin sorting task and reward function. Our task requires agents to accurately estimate future rewards and therefore use high discount factors in their Q-Value calculation. We investigate the behaviour of an adaptation of the VPG training algorithm on our task. We show that this adaptation can not accurately predict the required long-term action sequences. In addition to the limitations identified by Ewerton et al., it suffers from the known Deep Q-Learning problem of overestimated Q-Values. In an effort to solve our task, we turn to the Hourglass models and combine them with the Double Q-Learning approach. We show that this approach enables the models to accurately predict long-term action sequences when trained with large discount factors. Our results show that the Double Q-Learning technique is essential for training with very high discount factors, as the models Q-Value predictions diverge otherwise. We also experiment with different approaches for discount factor scheduling, loss calculation and exploration procedures. Our results show that the latter factors do not visibly influence the model's performance for our task.
    Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. (arXiv:2204.03574v1 [cs.LG])
    We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) without the overhead of fine-tuning the entire model. VLMs can represent arbitrary classes as natural language prompts in their flexible text encoders but they underperform state-of-the-art methods on compositional zero-shot benchmark tasks. To improve VLMs, we propose a novel form of soft prompting. We treat the attributes and objects that are composed to define classes as learnable tokens of vocabulary and tune them on multiple prompt compositions. During inference, we recompose the learned attribute-object vocabulary in new combinations and show that CSP outperforms the original VLM on benchmark datasets by an average of 14.7 percentage points of accuracy. CSP also achieves new state-of-the-art accuracies on two out of three benchmark datasets, while only fine-tuning a small number of parameters. Further, we show that CSP improves generalization to higher-order attribute-attribute-object compositions and combinations of pretrained attributes and fine-tuned objects.
    Risk-based regulation for all: The need and a method for a wide adoption solution for data-driven inspection targeting. (arXiv:2204.03583v1 [cs.LG])
    Access to data and data processing, including the use of machine learning techniques, has become significantly easier and cheaper in recent years. Nevertheless, solutions that can be widely adopted by regulators for market monitoring and inspection targeting in a data-driven way have not been frequently discussed by the scientific community. This article discusses the need and the difficulties for the development of such solutions, presents an effective method to address regulation planning, and illustrates its use to account for the most important and common subject for the majority of regulators: the consumer. This article hopes to contribute to increase the awareness of the regulatory community to the need for data processing methods that are objective, impartial, transparent, explainable, simple to implement and with low computational cost, aiming to the implementation of risk-based regulation in the world.
    BERTuit: Understanding Spanish language in Twitter through a native transformer. (arXiv:2204.03465v1 [cs.CL])
    The appearance of complex attention-based language models such as BERT, Roberta or GPT-3 has allowed to address highly complex tasks in a plethora of scenarios. However, when applied to specific domains, these models encounter considerable difficulties. This is the case of Social Networks such as Twitter, an ever-changing stream of information written with informal and complex language, where each message requires careful evaluation to be understood even by humans given the important role that context plays. Addressing tasks in this domain through Natural Language Processing involves severe challenges. When powerful state-of-the-art multilingual language models are applied to this scenario, language specific nuances use to get lost in translation. To face these challenges we present \textbf{BERTuit}, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets using RoBERTa optimization. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network, with special emphasis on solutions devoted to tackle the spreading of misinformation in this platform. BERTuit is evaluated on several tasks and compared against M-BERT, XLM-RoBERTa and XLM-T, very competitive multilingual transformers. The utility of our approach is shown with applications, in this case: a zero-shot methodology to visualize groups of hoaxes and profiling authors spreading disinformation. Misinformation spreads wildly on platforms such as Twitter in languages other than English, meaning performance of transformers may suffer when transferred outside English speaking communities.
    Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum. (arXiv:2204.03236v1 [cs.LG])
    Various neural network models have been proposed to tackle combinatorial optimization problems such as the travelling salesman problem (TSP). Existing learning-based TSP methods adopt a simple setting that the training and testing data are independent and identically distributed. However, the existing literature fails to solve TSP instances when training and testing data have different distributions. Concretely, we find that different training and testing distribution will result in more difficult TSP instances, i.e., the solution obtained by the model has a large gap from the optimal solution. To tackle this problem, in this work, we study learning-based TSP methods when training and testing data have different distributions using adaptive-hardness, i.e., how difficult a TSP instance can be for a solver. This problem is challenging because it is non-trivial to (1) define hardness measurement quantitatively; (2) efficiently and continuously generate sufficiently hard TSP instances upon model training; (3) fully utilize instances with different levels of hardness to learn a more powerful TSP solver. To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances. Then, we propose a hardness-adaptive generator to generate instances with different hardness. We further propose a curriculum learner fully utilizing these instances to train the TSP solver. Experiments show that our hardness-adaptive generator can generate instances ten times harder than the existing methods, and our proposed method achieves significant improvement over state-of-the-art models in terms of the optimality gap.
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.
    Generalised Latent Assimilation in Heterogeneous Reduced Spaces with Machine Learning Surrogate Models. (arXiv:2204.03497v1 [cs.LG])
    Reduced-order modelling and low-dimensional surrogate models generated using machine learning algorithms have been widely applied in high-dimensional dynamical systems to improve the algorithmic efficiency. In this paper, we develop a system which combines reduced-order surrogate models with a novel data assimilation (DA) technique used to incorporate real-time observations from different physical spaces. We make use of local smooth surrogate functions which link the space of encoded system variables and the one of current observations to perform variational DA with a low computational cost. The new system, named Generalised Latent Assimilation can benefit both the efficiency provided by the reduced-order modelling and the accuracy of data assimilation. A theoretical analysis of the difference between surrogate and original assimilation cost function is also provided in this paper where an upper bound, depending on the size of the local training set, is given. The new approach is tested on a high-dimensional CFD application of a two-phase liquid flow with non-linear observation operators that current Latent Assimilation methods can not handle. Numerical results demonstrate that the proposed assimilation approach can significantly improve the reconstruction and prediction accuracy of the deep learning surrogate model which is nearly 1000 times faster than the CFD simulation.
    Explicit Feature Interaction-aware Graph Neural Networks. (arXiv:2204.03225v1 [cs.LG])
    Graph neural networks are powerful methods to handle graph-structured data. However, existing graph neural networks only learn higher-order feature interactions implicitly. Thus, they cannot capture information that occurred in low-order feature interactions. To overcome this problem, we propose Explicit Feature Interaction-aware Graph Neural Network (EFI-GNN), which explicitly learns arbitrary-order feature interactions. EFI-GNN can jointly learn with any other graph neural network. We demonstrate that the joint learning method always enhances performance on the various node classification tasks. Furthermore, since EFI-GNN is inherently a linear model, we can interpret the prediction result of EFI-GNN. With the computation rule, we can obtain an any-order feature's effect on the decision. By that, we visualize the effects of the first-order and second-order features as a form of a heatmap.
    Half-sibling regression meets exoplanet imaging: PSF modeling and subtraction using a flexible, domain knowledge-driven, causal framework. (arXiv:2204.03439v1 [astro-ph.IM])
    High-contrast imaging of exoplanets hinges on powerful post-processing methods to denoise the data and separate the signal of a companion from its host star, which is typically orders of magnitude brighter. Existing post-processing algorithms do not use all prior domain knowledge that is available about the problem. We propose a new method that builds on our understanding of the systematic noise and the causal structure of the data-generating process. Our algorithm is based on a modified version of half-sibling regression (HSR), a flexible denoising framework that combines ideas from the fields of machine learning and causality. We adapt the method to address the specific requirements of high-contrast exoplanet imaging data obtained in pupil tracking mode. The key idea is to estimate the systematic noise in a pixel by regressing the time series of this pixel onto a set of causally independent, signal-free predictor pixels. We use regularized linear models in this work; however, other (non-linear) models are also possible. In a second step, we demonstrate how the HSR framework allows us to incorporate observing conditions such as wind speed or air temperature as additional predictors. When we apply our method to four data sets from the VLT/NACO instrument, our algorithm provides a better false-positive fraction than PCA-based PSF subtraction, a popular baseline method in the field. Additionally, we find that the HSR-based method provides direct and accurate estimates for the contrast of the exoplanets without the need to insert artificial companions for calibration in the data sets. Finally, we present first evidence that using the observing conditions as additional predictors can improve the results. Our HSR-based method provides an alternative, flexible and promising approach to the challenge of modeling and subtracting the stellar PSF and systematic noise in exoplanet imaging data.
    mulEEG: A Multi-View Representation Learning on EEG Signals. (arXiv:2204.03272v1 [cs.LG])
    Modeling effective representations using multiple views that positively influence each other is challenging, and the existing methods perform poorly on Electroencephalogram (EEG) signals for sleep-staging tasks. In this paper, we propose a novel multi-view self-supervised method (mulEEG) for unsupervised EEG representation learning. Our method attempts to effectively utilize the complementary information available in multiple views to learn better representations. We introduce diverse loss that further encourages complementary information across multiple views. Our method with no access to labels beats the supervised training while outperforming multi-view baseline methods on transfer learning experiments carried out on sleep-staging tasks. We posit that our method was able to learn better representations by using complementary multi-views.
    Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories. (arXiv:2204.03051v1 [cs.RO])
    We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a sequence of instructions, to an adequate sequence of movements to be executed by a robot. In the first stage, we perceive and pre-process the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the multimodal latent values into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution in a robotic platform. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities (e.g., image, sound), and generates the corresponding motion trajectory to write it, creating coherent and readable handwritten words.
    DiffCloud: Real-to-Sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects. (arXiv:2204.03139v1 [cs.RO])
    Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed. We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude. Videos and supplementary material are available at https://tinyurl.com/diffcloud.
    Machine Learning-Enabled IoT Security: Open Issues and Challenges Under Advanced Persistent Threats. (arXiv:2204.03433v1 [cs.CR])
    Despite its technological benefits, Internet of Things (IoT) has cyber weaknesses due to the vulnerabilities in the wireless medium. Machine learning (ML)-based methods are widely used against cyber threats in IoT networks with promising performance. Advanced persistent threat (APT) is prominent for cybercriminals to compromise networks, and it is crucial to long-term and harmful characteristics. However, it is difficult to apply ML-based approaches to identify APT attacks to obtain a promising detection performance due to an extremely small percentage among normal traffic. There are limited surveys to fully investigate APT attacks in IoT networks due to the lack of public datasets with all types of APT attacks. It is worth to bridge the state-of-the-art in network attack detection with APT attack detection in a comprehensive review article. This survey article reviews the security challenges in IoT networks and presents the well-known attacks, APT attacks, and threat models in IoT systems. Meanwhile, signature-based, anomaly-based, and hybrid intrusion detection systems are summarized for IoT networks. The article highlights statistical insights regarding frequently applied ML-based methods against network intrusion alongside the number of attacks types detected. Finally, open issues and challenges for common network intrusion and APT attacks are presented for future research.
    Few-Shot Forecasting of Time-Series with Heterogeneous Channels. (arXiv:2204.03456v1 [cs.LG])
    Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information.
    Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift. (arXiv:2204.03342v1 [cs.LG])
    The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representations that reduces domain discrepancy. This paper proposes a novel supervised domain adaptation based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover distance with Laplacian regularization, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and maximum mean discrepancy with higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.
    Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With Class-Dependent Confidence. (arXiv:2204.03431v1 [cs.LG])
    Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for "easy" inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach.
    PALBERT: Teaching ALBERT to Ponder. (arXiv:2204.03276v1 [cs.LG])
    Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence on wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed. Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layers index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from i-th layer, introduces major variance in model outputs, significantly reducing the resulting models performance. In this paper, we propose Ponder ALBERT (PALBERT): an improvement to PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We compared PALBERT with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.
    Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch. (arXiv:2204.03418v1 [cs.LG])
    We present Continual Inference, a Python library for implementing Continual Inference Networks (CINs) in PyTorch, a class of Neural Networks designed specifically for efficient inference in both online and batch processing scenarios. We offer a comprehensive introduction and guide to CINs and their implementation in practice, and provide best-practices and code examples for composing complex modules for modern Deep Learning. Continual Inference is readily downloadable via the Python Package Index and at \url{www.github.com/lukashedegaard/continual-inference}.
    Using Decision Tree as Local Interpretable Model in Autoencoder-based LIME. (arXiv:2204.03321v1 [cs.LG])
    Nowadays, deep neural networks are being used in many domains because of their high accuracy results. However, they are considered as "black box", means that they are not explainable for humans. On the other hand, in some tasks such as medical, economic, and self-driving cars, users want the model to be interpretable to decide if they can trust these results or not. In this work, we present a modified version of an autoencoder-based approach for local interpretability called ALIME. The ALIME itself is inspired by a famous method called Local Interpretable Model-agnostic Explanations (LIME). LIME generates a single instance level explanation by generating new data around the instance and training a local linear interpretable model. ALIME uses an autoencoder to weigh the new data around the sample. Nevertheless, the ALIME uses a linear model as the interpretable model to be trained locally, just like the LIME. This work proposes a new approach, which uses a decision tree instead of the linear model, as the interpretable model. We evaluate the proposed model in case of stability, local fidelity, and interpretability on different datasets. Compared to ALIME, the experiments show significant results on stability and local fidelity and improved results on interpretability.
    Fusing finetuned models for better pretraining. (arXiv:2204.03044v1 [cs.CL])
    Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour that few can undertake. In this paper, we create better base models at hardly any cost, by fusing multiple existing fine tuned models into one. Specifically, we fuse by averaging the weights of these models. We show that the fused model results surpass the pretrained model ones. We also show that fusing is often better than intertraining. We find that fusing is less dependent on the target task. Furthermore, weight decay nullifies intertraining effects but not those of fusing.
    Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators. (arXiv:2204.03243v1 [cs.CL])
    We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
    Distributed Statistical Min-Max Learning in the Presence of Byzantine Agents. (arXiv:2204.03187v1 [cs.LG])
    Recent years have witnessed a growing interest in the topic of min-max optimization, owing to its relevance in the context of generative adversarial networks (GANs), robust control and optimization, and reinforcement learning. Motivated by this line of work, we consider a multi-agent min-max learning problem, and focus on the emerging challenge of contending with worst-case Byzantine adversarial agents in such a setup. By drawing on recent results from robust statistics, we design a robust distributed variant of the extra-gradient algorithm - a popular algorithmic approach for min-max optimization. Our main contribution is to provide a crisp analysis of the proposed robust extra-gradient algorithm for smooth convex-concave and smooth strongly convex-strongly concave functions. Specifically, we establish statistical rates of convergence to approximate saddle points. Our rates are near-optimal, and reveal both the effect of adversarial corruption and the benefit of collaboration among the non-faulty agents. Notably, this is the first paper to provide formal theoretical guarantees for large-scale distributed min-max learning in the presence of adversarial agents.
    Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms. (arXiv:2204.03214v1 [cs.CR])
    The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate the well performance of these models on software vulnerability detection. The answer enables extending transformer-based language models to vulnerability detection and leveraging superior performance beyond the natural language processing domain. Besides, we perform the model's security check using Microsoft's Counterfit, a command-line tool to assess the model's security. Our results find that these models are vulnerable to adversarial examples. In this regard, we present a simple countermeasure and its result. Experimenting with large models is always a challenge due to the requirement of computing resources and platforms/libraries & dependencies. Based on the experiences and difficulties we faced during this work, we present our recommendation while choosing the platforms to run these large models. Moreover, the popular platforms are surveyed thoroughly in this paper.
    Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes. (arXiv:2204.03376v1 [cs.LG])
    Hybrid closed loop systems represent the future of care for people with type 1 diabetes (T1D). These devices usually utilise simple control algorithms to select the optimal insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning (RL) has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This work presents an evaluation of offline RL as a means for developing clinically effective dosing policies without the need for patient interaction. This paper examines the utility of BCQ, CQL and TD3-BC in managing the blood glucose of nine virtual patients within the UVA/Padova glucose dynamics simulator. When trained on less than a tenth of the data required by online RL approaches, this work shows that offline RL can significantly increase time in the healthy blood glucose range when compared to the strongest state-of-art baseline. This is achieved without any associated increase in low blood glucose events. Offline RL is also shown to be able to correct for common and challenging scenarios such as incorrect bolus dosing, irregular meal timings and sub-optimal training data.
    Graph Neural Networks Designed for Different Graph Types: A Survey. (arXiv:2204.03080v1 [cs.LG])
    Graphs are ubiquitous in nature and can therefore serve as models for many practical but also theoretical problems. Based on this, the young research field of Graph Neural Networks (GNNs) has emerged. Despite the youth of the field and the speed in which new models are developed, many good surveys have been published in the last years. Nevertheless, an overview on which graph types can be modeled by GNNs is missing. In this survey, we give a detailed overview of already existing GNNs and, unlike previous surveys, categorize them according to their ability to handle different graph types. We consider GNNs operating on static as well as on dynamic graphs of different structural constitutions, with or without node or edge attributes. Moreover in the dynamic case, we separate the models in discrete-time and continuous-time dynamic graphs based on their architecture. According to our findings, there are still graph types, that are not covered by existing GNN models. Specifically, models concerning heterogeneity in attributes are missing and the deletion of nodes and edges is only covered rarely.
    Jacobian Norm for Unsupervised Source-Free Domain Adaptation. (arXiv:2204.03467v1 [cs.LG])
    Unsupervised Source (data) Free domain adaptation (USFDA) aims to transfer knowledge from a well-trained source model to a related but unlabeled target domain. In such a scenario, all conventional adaptation methods that require source data fail. To combat this challenge, existing USFDAs turn to transfer knowledge by aligning the target feature to the latent distribution hidden in the source model. However, such information is naturally limited. Thus, the alignment in such a scenario is not only difficult but also insufficient, which degrades the target generalization performance. To relieve this dilemma in current USFDAs, we are motivated to explore a new perspective to boost their performance. For this purpose and gaining necessary insight, we look back upon the origin of the domain adaptation and first theoretically derive a new-brand target generalization error bound based on the model smoothness. Then, following the theoretical insight, a general and model-smoothness-guided Jacobian norm (JN) regularizer is designed and imposed on the target domain to mitigate this dilemma. Extensive experiments are conducted to validate its effectiveness. In its implementation, just with a few lines of codes added to the existing USFDAs, we achieve superior results on various benchmark datasets.
    Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation. (arXiv:2204.03500v1 [cs.LG])
    The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FeSTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FeSTA, p-FeSTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance without sacrificing privacy. Experimental results confirm that the proposed method significantly enhances the benefit of multi-task collaboration, communication efficiency, and privacy preservation, shedding light on practical multi-task distributed learning in the field of medical imaging.
    Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules. (arXiv:2109.10476v2 [cs.LG] UPDATED)
    We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.
    RF Signal Transformation and Classification using Deep Neural Networks. (arXiv:2204.03564v1 [eess.SP])
    Deep neural networks (DNNs) designed for computer vision and natural language processing tasks cannot be directly applied to the radio frequency (RF) datasets. To address this challenge, we propose to convert the raw RF data to data types that are suitable for off-the-shelf DNNs by introducing a convolutional transform technique. In addition, we propose a simple 5-layer convolutional neural network architecture (CONV-5) that can operate with raw RF I/Q data without any transformation. Further, we put forward an RF dataset, referred to as RF1024, to facilitate future RF research. RF1024 consists of 8 different RF modulation classes with each class having 1000/200 training/test samples. Each sample of the RF1024 dataset contains 1024 complex I/Q values. Lastly, the experiments are performed on the RadioML2016 and RF1024 datasets to demonstrate the improved classification performance.
    AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis. (arXiv:2204.03105v1 [cs.CV])
    In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis. Previous works either apply spherical texture maps which may lead to large distortions, or use continuous texture fields that yield smooth outputs lacking details. We argue that the traditional way of representing textures with images and linking them to a 3D mesh via UV mapping is more desirable, since synthesizing 2D images is a well-studied problem. We propose AUV-Net which learns to embed 3D surfaces into a 2D aligned UV space, by mapping the corresponding semantic parts of different 3D shapes to the same location in the UV space. As a result, textures are aligned across objects, and can thus be easily synthesized by generative models of images. Texture alignment is learned in an unsupervised manner by a simple yet effective texture alignment module, taking inspiration from traditional works on linear subspace learning. The learned UV mapping and aligned texture representations enable a variety of applications including texture transfer, texture synthesis, and textured single view 3D reconstruction. We conduct experiments on multiple datasets to demonstrate the effectiveness of our method. Project page: https://nv-tlabs.github.io/AUV-NET.
    FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity. (arXiv:2204.03529v1 [cs.LG])
    Federated Learning (FL) is an emerging framework for distributed processing of large data volumes by edge devices subject to limited communication bandwidths, heterogeneity in data distributions and computational resources, as well as privacy considerations. In this paper, we introduce a new FL protocol termed FedADMM based on primal-dual optimization. The proposed method leverages dual variables to tackle statistical heterogeneity, and accommodates system heterogeneity by tolerating variable amount of work performed by clients. FedADMM maintains identical communication costs per round as FedAvg/Prox, and generalizes them via the augmented Lagrangian. A convergence proof is established for nonconvex objectives, under no restrictions in terms of data dissimilarity or number of participants per round of the algorithm. We demonstrate the merits through extensive experiments on real datasets, under both IID and non-IID data distributions across clients. FedADMM consistently outperforms all baseline methods in terms of communication efficiency, with the number of rounds needed to reach a prescribed accuracy reduced by up to 87%. The algorithm effectively adapts to heterogeneous data distributions through the use of dual variables, without the need for hyperparameter tuning, and its advantages are more pronounced in large-scale systems.
    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. (arXiv:2204.03310v1 [eess.AS])
    Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
    AI-aided Traffic Control Scheme for M2M Communications in the Internet of Vehicles. (arXiv:2204.03504v1 [cs.NI])
    Due to the rapid growth of data transmissions in internet of vehicles (IoV), finding schemes that can effectively alleviate access congestion has become an important issue. Recently, many traffic control schemes have been studied. Nevertheless, the dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies, which is significant for the random access resource allocation. In this paper, we consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it. Firstly, IoV devices are divided into various classes based on delay characteristics. The target of maximizing the successful transmission of packets with the success rate constraint is established. Then, the optimization objective is transformed into a markov decision process (MDP) model. Finally, the access class barring (ACB) factors are obtained based on the PPO method to maximize the number of successful access devices. The performance of the proposal algorithm in respect of successful events and delay compared to existing schemes is verified by simulations.
    Enabling Deep Learning for All-in EDGE paradigm. (arXiv:2204.03326v1 [cs.LG])
    Deep Learning-based models have been widely investigated, and they have demonstrated significant performance on non-trivial tasks such as speech recognition, image processing, and natural language understanding. However, this is at the cost of substantial data requirements. Considering the widespread proliferation of edge devices (e.g. Internet of Things devices) over the last decade, Deep Learning in the edge paradigm, such as device-cloud integrated platforms, is required to leverage its superior performance. Moreover, it is suitable from the data requirements perspective in the edge paradigm because the proliferation of edge devices has resulted in an explosion in the volume of generated and collected data. However, there are difficulties due to other requirements such as high computation, high latency, and high bandwidth caused by Deep Learning applications in real-world scenarios. In this regard, this survey paper investigates Deep Learning at the edge, its architecture, enabling technologies, and model adaption techniques, where edge servers and edge devices participate in deep learning training and inference. For simplicity, we call this paradigm the All-in EDGE paradigm. Besides, this paper presents the key performance metrics for Deep Learning at the All-in EDGE paradigm to evaluate various deep learning techniques and choose a suitable design. Moreover, various open challenges arising from the deployment of Deep Learning at the All-in EDGE paradigm are identified and discussed.
    Standardized feature extraction from pairwise conflicts applied to the train rescheduling problem. (arXiv:2204.03061v1 [cs.LG])
    We propose a train rescheduling algorithm which applies a standardized feature selection based on pairwise conflicts in order to serve as input for the reinforcement learning framework. We implement an analytical method which identifies and optimally solves every conflict arising between two trains, then we design a corresponding observation space which features the most relevant information considering these conflicts. The data obtained this way then translates to actions in the context of the reinforcement learning framework. We test our preliminary model using the evaluation metrics of the Flatland Challenge. The empirical results indicate that the suggested feature space provides meaningful observations, from which a sensible scheduling policy can be learned.
    Optimization Models and Interpretations for Three Types of Adversarial Perturbations against Support Vector Machines. (arXiv:2204.03154v1 [cs.LG])
    Adversarial perturbations have drawn great attentions in various deep neural networks. Most of them are computed by iterations and cannot be interpreted very well. In contrast, little attentions are paid to basic machine learning models such as support vector machines. In this paper, we investigate the optimization models and the interpretations for three types of adversarial perturbations against support vector machines, including sample-adversarial perturbations (sAP), class-universal adversarial perturbations (cuAP) as well as universal adversarial perturbations (uAP). For linear binary/multi classification support vector machines (SVMs), we derive the explicit solutions for sAP, cuAP and uAP (binary case), and approximate solution for uAP of multi-classification. We also obtain the upper bound of fooling rate for uAP. Such results not only increase the interpretability of the three adversarial perturbations, but also provide great convenience in computation since iterative process can be avoided. Numerical results show that our method is fast and effective in calculating three types of adversarial perturbations.
    Temporal Alignment for History Representation in Reinforcement Learning. (arXiv:2204.03525v1 [cs.LG])
    Environments in Reinforcement Learning are usually only partially observable. To address this problem, a possible solution is to provide the agent with information about the past. However, providing complete observations of numerous steps can be excessive. Inspired by human memory, we propose to represent history with only important changes in the environment and, in our approach, to obtain automatically this representation using self-supervision. Our method (TempAl) aligns temporally-close frames, revealing a general, slowly varying state of the environment. This procedure is based on contrastive loss, which pulls embeddings of nearby observations to each other while pushing away other samples from the batch. It can be interpreted as a metric that captures the temporal relations of observations. We propose to combine both common instantaneous and our history representation and we evaluate TempAl on all available Atari games from the Arcade Learning Environment. TempAl surpasses the instantaneous-only baseline in 35 environments out of 49. The source code of the method and of all the experiments is available at https://github.com/htdt/tempal.
    Multi-task nonparallel support vector machine for classification. (arXiv:2204.02972v1 [cs.LG])
    Direct multi-task twin support vector machine (DMTSVM) explores the shared information between multiple correlated tasks, then it produces better generalization performance. However, it contains matrix inversion operation when solving the dual problems, so it costs much running time. Moreover, kernel trick cannot be directly utilized in the nonlinear case. To effectively avoid above problems, a novel multi-task nonparallel support vector machine (MTNPSVM) including linear and nonlinear cases is proposed in this paper. By introducing epsilon-insensitive loss instead of square loss in DMTSVM, MTNPSVM effectively avoids matrix inversion operation and takes full advantage of the kernel trick. Theoretical implication of the model is further discussed. To further improve the computational efficiency, the alternating direction method of multipliers (ADMM) is employed when solving the dual problem. The computational complexity and convergence of the algorithm are provided. In addition, the property and sensitivity of the parameter in model are further explored. The experimental results on fifteen benchmark datasets and twelve image datasets demonstrate the validity of MTNPSVM in comparison with the state-of-the-art algorithms. Finally, it is applied to real Chinese Wine dataset, and also verifies its effectiveness.
    Federated Learning for Distributed Spectrum Sensing in NextG Communication Networks. (arXiv:2204.03027v1 [cs.NI])
    NextG networks are intended to provide the flexibility of sharing the spectrum with incumbent users and support various spectrum monitoring tasks such as anomaly detection, fault diagnostics, user equipment identification, and authentication. A network of wireless sensors is needed to monitor the spectrum for signal transmissions of interest over a large deployment area. Each sensor receives signals under a specific channel condition depending on its location and trains an individual model of a deep neural network (DNN) accordingly to classify signals. To improve the accuracy, individual sensors may exchange sensing data or sensor results with each other or with a fusion center (such as in cooperative spectrum sensing). In this paper, distributed federated learning over a multi-hop wireless network is considered to collectively train a DNN for signal identification. In distributed federated learning, each sensor broadcasts its trained model to its neighbors, collects the DNN models from its neighbors, and aggregates them to initialize its own model for the next round of training. Without exchanging any spectrum data, this process is repeated over time such that a common DNN is built across the network while preserving the privacy associated with signals collected at different locations. Signal classification accuracy and convergence time are evaluated for different network topologies (including line, star, ring, grid, and random networks) and packet loss events. Then, the reduction of communication overhead and energy consumption is considered with random participation of sensors in model updates. The results show the feasibility of extending cooperative spectrum sensing over a general multi-hop wireless network through federated learning and indicate its robustness to wireless network effects, thereby sustaining high accuracy with low communication overhead and energy consumption.
    Faster algorithms for learning to link, align sequences, and price two-part tariffs. (arXiv:2204.03569v1 [cs.DS])
    Data-driven algorithm configuration is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of efficient data-driven algorithms for algorithm families with more than one parameter. In this work we provide algorithms for efficient (output-polynomial) multidimensional parameter tuning, i.e. for families with a small constant number of parameters, for three very different combinatorial problems -- linkage-based clustering, dynamic programming for sequence alignment, and auction design for two-part tariff schemes. We extend the single-parameter clustering algorithm of Balcan et al. 2020 arXiv:1907.00533 to multiple parameters and to the sequence alignment problem by proposing an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. A key problem-specific challenge is to efficiently compute how the partition of the parameter space (into regions with unique algorithmic states) changes with a single algorithmic step. We give algorithms which improve on the runtime of previously best known results for linkage-based clustering, sequence alignment and two-part tariff pricing.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Knowledge Infused Decoding. (arXiv:2204.03084v1 [cs.CL])
    Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID) -- a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https://github.com/microsoft/KID.
    Enhancement on Model Interpretability and Sleep Stage Scoring Performance with A Novel Pipeline Based on Deep Neural Network. (arXiv:2204.03173v1 [cs.LG])
    Considering the natural frequency characteristics in sleep medicine, this paper first proposes a time-frequency framework for the representation learning of the electroencephalogram (EEG) following the definition of the American Academy of Sleep Medicine. To meet the temporal-random and transient nature of the defining characteristics of sleep stages, we further design a context-sensitive flexible pipeline that automatically adapts to the attributes of data itself. That is, the input EEG spectrogram is partitioned into a sequence of patches in the time and frequency axes, and then input to a delicate deep learning network for further representation learning to extract the stage-dependent features, which are used in the classification step finally. The proposed pipeline is validated against a large database, i.e., the Sleep Heart Health Study (SHHS), and the results demonstrate that the competitive performance for the wake, N2, and N3 stages outperforms the state-of-art works, with the F1 scores being 0.93, 0.88, and 0.87, respectively, and the proposed method has a high inter-rater reliability of 0.80 kappa. Importantly, we visualize the stage scoring process of the model decision with the Layer-wise Relevance Propagation (LRP) method, which shows that the proposed pipeline is more sensitive and perceivable in the decision-making process than the baseline pipelines. Therefore, the pipeline together with the LRP method can provide better model interpretability, which is important for clinical support.
    Video Diffusion Models. (arXiv:2204.03458v1 [cs.CV])
    Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark. Supplementary material is available at https://video-diffusion.github.io/
    Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data. (arXiv:2204.02973v1 [cs.LG])
    Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.
    Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation. (arXiv:2204.03293v1 [cs.SE])
    Code search aims to retrieve the most semantically relevant code snippet for a given natural language query. Recently, large-scale code pre-trained models such as CodeBERT and GraphCodeBERT learn generic representations of source code and have achieved substantial improvement on code search task. However, the high-quality sequence-level representations of code snippets have not been sufficiently explored. In this paper, we propose a new approach with multimodal contrastive learning and soft data augmentation for code search. Multimodal contrastive learning is used to pull together the representations of code-query pairs and push apart the unpaired code snippets and queries. Moreover, data augmentation is critical in contrastive learning for learning high-quality representations. However, only semantic-preserving augmentations for source code are considered in existing work. In this work, we propose to do soft data augmentation by dynamically masking and replacing some tokens in code sequences to generate code snippets that are similar but not necessarily semantic-preserving as positive samples for paired queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. The experimental results show that our approach significantly outperforms the state-of-the-art methods. We also adapt our techniques to several pre-trained models such as RoBERTa and CodeBERT, and significantly boost their performance on the code search task.
    Self supervised learning for robust voice cloning. (arXiv:2204.03421v1 [cs.SD])
    Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework via the Bootstrap Your Own Latent (BYOL) method, which is shown to produce high quality speech representations when specific audio augmentations are applied to the vanilla algorithm. We further extend the augmentations in the training procedure to aid the resulting features to capture the speaker identity and to make them robust to noise and acoustic conditions. The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture, aiming to achieve multispeaker speech synthesis without utilizing additional speaker features. This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice. Subjective and objective evaluations are used to validate the proposed model, as well as the robustness to the acoustic conditions of the target utterance.
    Adaptive Spike-Like Representation of EEG Signals for Sleep Stages Scoring. (arXiv:2204.03565v1 [eess.SP])
    Recently there has seen promising results on automatic stage scoring by extracting spatio-temporal features from electroencephalogram (EEG). Such methods entail laborious manual feature engineering and domain knowledge. In this study, we propose an adaptive scheme to probabilistically encode, filter and accumulate the input signals and weight the resultant features by the half-Gaussian probabilities of signal intensities. The adaptive representations are subsequently fed into a transformer model to automatically mine the relevance between features and corresponding stages. Extensive experiments on the largest public dataset against state-of-the-art methods validate the effectiveness of our proposed method and reveal promising future directions.
    Deep transfer learning for system identification using long short-term memory neural networks. (arXiv:2204.03125v1 [eess.SY])
    Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources.
  • Open

    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.  ( 2 min )
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.  ( 2 min )
    Distributionally Robust Optimal Power Flow with Contextual Information. (arXiv:2109.07896v2 [math.OC] UPDATED)
    In this paper, we develop a distributionally robust chance-constrained formulation of the Optimal Power Flow problem (OPF) whereby the system operator can leverage contextual information. For this purpose, we exploit an ambiguity set based on probability trimmings and optimal transport through which the dispatch solution is protected against the incomplete knowledge of the relationship between the OPF uncertainties and the context that is conveyed by a sample of their joint probability distribution. We provide a tractable reformulation of the proposed distributionally robust chance-constrained OPF problem under the popular conditional-value-at-risk approximation. By way of numerical experiments run on a modified IEEE-118 bus network with wind uncertainty, we show how the power system can substantially benefit from taking into account the well-known statistical dependence between the point forecast of wind power outputs and its associated prediction error. Furthermore, the experiments conducted also reveal that the distributional robustness conferred on the OPF solution by our probability-trimmings-based approach is superior to that bestowed by alternative approaches in terms of expected cost and system reliability.  ( 2 min )
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.  ( 2 min )
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.  ( 2 min )
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.  ( 3 min )
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Categorical Distributions of Maximum Entropy under Marginal Constraints. (arXiv:2204.03406v1 [hep-th])
    The estimation of categorical distributions under marginal constraints summarizing some sample from a population in the most-generalizable way is key for many machine-learning and data-driven approaches. We provide a parameter-agnostic theoretical framework that enables this task ensuring (i) that a categorical distribution of Maximum Entropy under marginal constraints always exists and (ii) that it is unique. The procedure of iterative proportional fitting (IPF) naturally estimates that distribution from any consistent set of marginal constraints directly in the space of probabilities, thus deductively identifying a least-biased characterization of the population. The theoretical framework together with IPF leads to a holistic workflow that enables modeling any class of categorical distributions solely using the phenomenological information provided.  ( 2 min )
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.  ( 2 min )
    Sliced gradient-enhanced Kriging for high-dimensional function approximation. (arXiv:2204.03562v1 [stat.ML])
    Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. However, it tends to get impractical for high-dimensional problems due to the large inherent correlation matrix and the associated high-dimensional hyper-parameter tuning problem. To address these issues, we propose a new method in this paper, called sliced GE-Kriging (SGE-Kriging) for reducing both the size of the correlation matrix and the number of hyper-parameters. Firstly, we perform a derivative-based global sensitivity analysis to detect the relative importance of each input variable with respect to model response. Then, we propose to split the training sample set into multiple slices, and invoke Bayes' theorem to approximate the full likelihood function via a sliced likelihood function, in which multiple small correlation matrices are utilized to describe the correlation of the sample set. Additionally, we replace the original high-dimensional hyper-parameter tuning problem with a low-dimensional counterpart by learning the relationship between the hyper-parameters and the global sensitivity indices. Finally, we validate SGE-Kriging by means of numerical experiments with several benchmarks problems. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs. The benefits are most evident in high-dimensional problems.
    Data blurring: sample splitting a single sample. (arXiv:2112.11079v2 [stat.ME] UPDATED)
    Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.  ( 2 min )
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.  ( 3 min )
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.  ( 2 min )
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.  ( 2 min )
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.  ( 2 min )
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.  ( 2 min )
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    A novel nonconvex, smooth-at-origin penalty for statistical learning. (arXiv:2204.03123v1 [stat.ML])
    Nonconvex penalties are utilized for regularization in high-dimensional statistical learning algorithms primarily because they yield unbiased or nearly unbiased estimators for the parameters in the model. Nonconvex penalties existing in the literature such as SCAD, MCP, Laplace and arctan have a singularity at origin which makes them useful also for variable selection. However, in several high-dimensional frameworks such as deep learning, variable selection is less of a concern. In this paper, we present a nonconvex penalty which is smooth at origin. The paper includes asymptotic results for ordinary least squares estimators regularized with the new penalty function, showing asymptotic bias that vanishes exponentially fast. We also conducted an empirical study employing deep neural network architecture on three datasets and convolutional neural network on four datasets. The empirical study showed better performance for the new regularization approach in five out of the seven datasets.  ( 2 min )
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.  ( 2 min )
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.  ( 2 min )
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )

  • Open

    [D] Any guesstimates for how much a DALLE 2 generation will eventually cost?
    Just based on the estimated running costs of GPT3, and then whatever profit gets applied on top of that, are there any estimates for what openai will eventually charge for image generation? submitted by /u/EugeneJudo [link] [comments]  ( 1 min )
    Problem with CVPR template and arXiv? [D]
    I don't know what would be the best place to post this. But I am having trouble uploading an Overleaf manuscript to arXiv based on the CVPR 2022 template. I am getting the following error. Does anyone have any ideas? ​ https://preview.redd.it/y9jq0rbysds81.png?width=1614&format=png&auto=webp&s=16be3d32468837f649e846ec8a309dab2854c762 submitted by /u/avd4292 [link] [comments]
    [R]Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - Google Apr 2022
    Paper: https://arxiv.org/abs/2204.00598 https://socraticmodels.github.io/ Twitter: https://twitter.com/andyzengtweets/status/1512089759497269251 Abstract: " Large foundation models can exhibit unique capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g. from spreadsheets, to SAT questions). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -- in whi…  ( 1 min )
    [D] What to do next after the sanity check?
    I have two years of time-series data taken from two sensors which I have split into 80/10/10 non-overlapping train/val/test splits. The task is to denoise one sensor data into another and I am handling it as a regression problem. I am following this website and considered an already published model (5 convolutional and 1 fully connected layer) which is trained on a similar dataset and same task. For the sake of sanity check, as per the website, I have trained the model on a subset of trainset (3 months) and tried to overfit it (while evaluating on complete val set), which works fine. However, I am not sure what to do next from this point on? Shall I just train on the complete trainset now? Or do I increase the layers or play with other hyper params to find more details about my regression problem/data? I would really appreciate your comments. Thank you. PS. The target value is sparse i.e. more than 85% of the time it is zero. submitted by /u/muaz_usmani [link] [comments]  ( 1 min )
    [D] Leaving ML for Software Engineering?
    I'm keen to hear from people who have made the transition from ML Research/Engineering positions to software engineering roles (or who are considering it). What were your reasons for doing it and did you regret it? I see so many articles about transition from software to ML but none about going the opposite direction. Context: I've been working as an ML Engineer for a little over a year, and I'm just... not enjoying it. I want to love my job so badly as I like my boss, my colleagues, and the company (and I'm paid quite well for my level), but I just don't. I feel like the type of work I'm doing is not very smart and yet it's extremely draining: I spend so many hours just looking at loss curves, tweaking features and parameters. I'm somehow bored and stressed at the same time, because I don't enjoy the work and yet I feel the pressure to produce good models, and when they don't work as expected I can't help but take it personally as if if I just tried hard enough they would work. I find that the days were I end up having to take care of more purely engineering tasks I just have a lot more fun and I finish the day more satisfied and less drained. I think I just want to build something instead of spending hours banging my head against shit data. I would love to hear from people who feel or have felt the same way because whenever I speak about this with friends who are in ML they look at me like I'm a lunatic for wanting to leave it for software engineering. I'm obviously aware that swe roles are not all fun and games, but I just feel like there's been an excessive push for so many people to move to ML as it's "cool" and "smart" when in reality they're just different things who are going to suit different people. submitted by /u/hedy-m [link] [comments]  ( 5 min )
    [D] Is virtual ICLR 2022 worth paying for
    The 2022 ICLR conference at the end of this month is virtual and costs $100 to attend. I was thinking of attending for networking opportunities but I’m not sure. Is it a good idea to go for it? submitted by /u/sybar142857 [link] [comments]  ( 1 min )
    [D] Triplet vs. Contrastive Loss
    The online triplet mining strategy is more efficient than the offline one. It implies "getting a batch of n samples and their associated labels, and form triplets on the fly." Here is an article about Triplet vs. Contrastive Loss comparison and its efficient implementation. I would like to know your feedback. submitted by /u/devzaya [link] [comments]
    [P] Animated Character Generator
    Hello everybody, I'd like to share the latest machine learning project of mine. It allows one to generate animated characters in the style of old video game consoles. Here are some examples. I would appreciate any feedback. https://i.redd.it/sig6ilpi8bs81.gif https://i.redd.it/xaf906qi8bs81.gif https://i.redd.it/8v7lz2qi8bs81.gif submitted by /u/ie9res [link] [comments]
    [D] Annotation formats for image annotations?
    Hey ML people, what is your favorite annotation format for image bounding boxes/labels? I know coco is very popular, we are rethinking parts of our data infrastructure wondering what everyone is using. Our platform hosts hundreds of millions of images. Ideal format would support running queries on data stored in a Data lake If the format supports 3D annotation types that is even better. Thanks for your insights in advance. submitted by /u/mmuppidi [link] [comments]  ( 1 min )
    [N] OpenAI's DALL-E 2 paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" has been updated with added section "Training details" (see Appendix C)
    New version of paper is linked to in the DALL-E 2 blog post and also here (pdf file format). Tweet announcing updated paper. Older version of paper (pdf file format). Original Reddit post. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    Dense Passage Retriever(DPR) Open-QA System [P]
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] Works that can process variable input resolution of images
    Hi. I'm looking for existing computer vision papers /networks that can process variable input resolution. Can anyone share me with similar works? For example, a network/layer N can take both inputs with H*W and 2H*2H individually and give correct prediction. One of them I know is ROI pooling used in Faster RCNN. Thanks very much. submitted by /u/vincent341 [link] [comments]  ( 1 min )
    [D] Machine Learning Engineers - What Does Your Day Involve?
    Hey, I'm looking to transition from my current role as a data scientist to one that has a machine learning engineering focus. I was wondering if anyone could provide insights into how they plan their day, or what activities you do throughout the day/week. I'd be particularly interested to understand the balance between deploying models/writing production worthy code and your time spent learning/developing given the field is moving so fast. submitted by /u/MenArePigs69 [link] [comments]  ( 4 min )
    [D] Bayesian Non-Parametrics for Ranking?
    I am currently sitting at a difficult machine-learning problem that I have found no literature on how to solve it. I am given n datapoints x_1,...,x_n that are ordered according to a ranking preference rank(x_1)<rank(x_2)<...<rank(x_n). I am assuming there exists a function f, such, that f(x_i)<f(x_i+1). I am now searching a Bayesian non-parametric model that gives the posterior probability of functions f that abide f(x_i)<f(x_i+1), so that i can estimate the relative rank preferences at new points. I have tried out a few things. The naive approach is using a GP prior on f. Unfortunately, computing the posterior distribution p(f(x_1), ... f(x_n)| f(x_1)<...<f(x_n)) has no closed form solution (it is a normal distribution with N linear constraints, which is absolutely terrible to sample from). This makes computing conditional distributions for predictions very challenging. I am currently approximating the solution by using a GP regression model with label y_i = rank(x_i)=i. But this is systematically under-estimating the shape-variation, due to the fact that it adds the assumption that function values between ranks are equidistant. Is there any known approach how to do this? submitted by /u/Ulfgardleo [link] [comments]  ( 2 min )
    [R] Video Diffusion Models
    From the webpage: We present results on video generation using diffusion models. We propose an architecture for video diffusion models which is a natural extension of the standard image architecture. We show that this architecture is effective for jointly training from image and video data. To generate long and higher resolution videos we introduce a new conditioning technique that performs better than previously proposed methods. We present results on text-conditioned video generation and state-of-the-art results on an unconditional video generation benchmark. Paper: https://arxiv.org/abs/2204.03458 https://video-diffusion.github.io/ submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] how to decide publication venue
    How to decide if a paper is appropriate for a specific venue? Moreover, how would you categorize the difference between a good NiPs publication and a good CvPR or ICCV publication? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] LR Warmup for PyTorch
    ​ RadamWarmup + CosineAnnealingLR + StepLR Colab Link pytorch_warmup v0.1.0 was released. submitted by /u/TonyY_RIMCS [link] [comments]
  • Open

    RL for dynamic environments
    In their 2019 review article in Nature Machine Intelligence, Neftci and Averbeck point out, “Most work in biological systems has focused on simple learning problems… where flexibility and ongoing learning are important, similar to real-world learning problems. In contrast, most work in artificial agents has focused on learning a single complex problem in a static environment.” Are there RL approaches designed to handle dynamic environments with changing reward functions? I did find this earlier post, but thought I'd ask if anyone had other suggested lines of reading. Thanks! submitted by /u/Careless-Argument-37 [link] [comments]  ( 1 min )
    Observationspace max Size?
    I want to give my AI as many information as possible. Can there be an Issue with a too large observation space? submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    "UC Berkeley’s Pieter Abbeel receives 2021 ACM Prize in Computing" (for DRL robotics)
    submitted by /u/gwern [link] [comments]
    Action Space Dimensional reduction for better convergence
    I am working on a project in which a robots learns its motion. for example Bipedal robot learns to walk on a straight line by learning to adjust the torque and angular velocity of each joint. However, the robot I am working on has complex architecture. It has 10 joints instead of 2, most importantly all of these joints work simultaneously and coherently to produce a desired motion. The Problem I am facing is that, the robot has ten joints and each joint can move between -450 to +450 for simplicity let me define State and Actions of the system State = -450 to +450 -------------> normalization ------------------> -10 to 10 Actions = choose an angle between -10 to 10 for each joint Total Action space for each State at each time step = 10 (no of joints each can move between -10 to 10 at each time step) * 360 (total time steps for a single motion) = 3600 (Output: No of angles required to generate a motion) I am using TD3 to solve this conundrum. The Action space is too large how can I reduce the action space? submitted by /u/SAM_Baloch [link] [comments]  ( 2 min )
    Dynamic action space in RL
    I am doing a project and there is a problem with dynamic action space A complete action space can be divided into four parts. In each state, the action to be selected is one of them For example, the total discrete action space length is 1000, which can be divided into four parts, [0:300], [301:500],[501:900],[901:1000] For state 1, action_ space is [0:300], State2, action_ space is [301:500], etc For this idea, I have several ideas at present: There is no restriction at all. The legal actions of all States are [1:1000], but it may take longer train time and there is not much innovation Soft constraint, for example, if state1 selects an illegal action, such as one action in [251: 500], reward gives a negative value, but it is also not innovative Hard constraint, use action space mask in each state, but I don't know how to do it.. Is there any relevant article? It is directly divided into four action spaces and uses multi-agent cooperative relationship learning ​ Any suggestions? thanks! submitted by /u/RangerWYR [link] [comments]  ( 2 min )
    Any paper suggestions??
    Hi everyone, i have to define a project for my master degree, so i'm looking for the best papers published since 2018-2019 until now in Reinforcement Learning . Do you have any suggestions, titles or projects that i can check? submitted by /u/acaviedes15 [link] [comments]  ( 1 min )
  • Open

    Responsible AI in a Global Context
    submitted by /u/john133435 [link] [comments]
    AI website that transitions photos into video?
    Remember using a website like a year ago, where you could put in 2 or more images, and it would sort of make a transition between the two with AI. Then you could export the video and such. You could also very extensively edit human faces and change small features on a scale from 1-100. The features where incredibly specific like brow bone and nasal bridge. If anyone has the website I would appreciate it!! submitted by /u/yungbenz0_bajs [link] [comments]  ( 1 min )
    How Artificial Intelligence Is Impacting Today’s Businesses
    submitted by /u/mr_j_b [link] [comments]
    Alibaba’s AI tool to improve efficiency of China’s waste-to-energy plants
    submitted by /u/mr_j_b [link] [comments]
    Best GAN for Tabular-data
    What in your opinion is the best GAN for tabular-data. Please include any references if you have any. submitted by /u/ily_jk [link] [comments]
    Supercharged UI for MLflow
    Hi guys, we've built a plugin that seamlessly reads MLflow logs and provides a beautiful UI to compare multiple runs with just a few clicks. You can: filter runs with a super versatile fully pythonic search group and aggregate your metrics / images We are trying make it work seamlessly with MLflow and complement its other awesome features 🎉 Here is more info about it https://aimstack.io/aimlflow Would love your feedback!! submitted by /u/ManeSa [link] [comments]  ( 1 min )
    Takeaways From 3 Years Working In Machine Learning
    submitted by /u/elcric_krej [link] [comments]
    OpenAI 's new model DALL·E 2 is amazing!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    The AI in a jar
    submitted by /u/bendee983 [link] [comments]
    Can Computers Learn Common Sense?
    submitted by /u/estasfuera [link] [comments]
    Metaverse weekly digest: Shiba Inu’s metaverse, Alibaba’s $60 million VR investment
    submitted by /u/bent_out_of_shape_ [link] [comments]
    Meet ‘ChestLink’, The First Autonomous AI Medical Imaging Application by ‘Oxipit’ That Received CE Mark Approval in the EU
    ​ https://preview.redd.it/e2q3jit3c8s81.png?width=1024&format=png&auto=webp&s=837aa6256647df6fb8777a02b04313a38428f573 The most common diagnostic imaging test conducted in emergency rooms is chest radiography. Providing automated preliminary read helpers to physicians might speed up surgery, enhance accuracy, and lower healthcare costs. An artificial intelligence tool that interprets chest X-rays without the intervention of a radiologist received regulatory approval in the European Union this week, marking a first for a wholly autonomous medical imaging AI, according to ‘Oxipit‘, the developer of this tool. It’s a watershed moment for AI, and it’s more than likely to spark debate, given that radiologists have spent the last few years working to fully automate parts of their jobs. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    Large-Scale Matrix Factorization on TPUs
    Posted by Harsh Mehta, Software Engineer, Google Research Matrix factorization is one of the oldest, yet still widely used, techniques for learning how to recommend items such as songs or movies from user ratings. In its basic form, it approximates a large, sparse (i.e., mostly empty) matrix of user-item interactions with a product of two smaller, denser matrices representing learned item and user features. These dense matrices, in turn, can be used to recommend items to a user with which they haven't interacted before. Despite its algorithmic simplicity, matrix factorization can still achieve competitive performance in recommender benchmarks. Alternating least squares (ALS), and especially its implicit variation, is a fundamental algorithm to learn the parameters of matrix factorization…  ( 9 min )
  • Open

    Rock On: Scientists Use AI to Improve Sequestering Carbon Underground
    A team of scientists have created a new AI-based tool to help lock up greenhouse gases like CO2 in porous rock formations faster and more precisely than ever before. Carbon capture technology, also referred to as carbon sequestration, is a climate change mitigation method that redirects CO2 emitted from power plants back underground. While doing Read article > The post Rock On: Scientists Use AI to Improve Sequestering Carbon Underground appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Build a custom entity recognizer for PDF documents using Amazon Comprehend
    In many industries, it’s critical to extract custom entities from documents in a timely manner. This can be challenging. Insurance claims, for example, often contain dozens of important attributes (such as dates, names, locations, and reports) sprinkled across lengthy and dense documents. Manually scanning and extracting such information can be error-prone and time-consuming. Rule-based software […]  ( 7 min )
    Getting started with the Amazon Kendra Box connector
    Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. For many organizations, Box Content Cloud is a core part of their content storage and lifecycle management […]  ( 6 min )
  • Open

    Hilbert transform and Mathematica
    The Hilbert transform of a function f(t) is a function fH(x) defined [1] by The integral must be interpreted in the sense of the Cauchy principal value: The integrand is not absolutely integrable because of the singularity at x and so the value of the integral depends on how you handle the singularity. The Cauchy […] Hilbert transform and Mathematica first appeared on John D. Cook.  ( 2 min )
    Visual integration
    The plot below is of a meromorphic function f(z). That is, the function f(z) is analytic except possibly at poles, and the colors represent the phase angles, the values of θ if you write the function values in polar form. What is the value of the integral where C is the perimeter of the square? […] Visual integration first appeared on John D. Cook.  ( 2 min )
  • Open

    Dense Passage Retriever(DPR) Open-QA System
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
  • Open

    Artificial Intelligence: Benefits for Automation Testing
    AI has been making a lot of noise of late, especially in the context of software development. Of course, this topic is quite wide, but in this article, we shall focus our attention on AI-driven automation testing. Let us start with understanding what is AI and automation testing. Automation testing refers to the process of… Read More »Artificial Intelligence: Benefits for Automation Testing The post Artificial Intelligence: Benefits for Automation Testing appeared first on Data Science Central.  ( 3 min )

  • Open

    Attend the 2022 National Autonomous Vehicle Expo (April 16-17th)
    Interested in the future of autonomous vehicles? Want to know more about the impacts of this technology? Join us on April 16-17th at the 2022 National Autonomous Vehicle Expo to discover the engineering, ethics, and policymaking of this emerging technology. The virtual expo consists of speaker and workshop sessions led by industry-leading companies, such as NVIDIA, Waymo, and Motional, as well as distinguished programs/organizations like MIT Beaverworks and InspiritAI. You will also have the opportunity to compete in our hackathon, where you can win a variety of cool prizes! Even if you don't participate in the hackathon, there will be free merchandise and giveaways throughout the expo! To register and/or view more information about the event, head over to avexpo.org. For hackathon-specific registration, you can visit our devpost at https://autonomous-vehicle-expo.devpost.com/. Hope to see you all there! ​ https://preview.redd.it/qgfx3sv837s81.png?width=1080&format=png&auto=webp&s=ed19d68bdff274de188deaa8f4338c864943b508 https://preview.redd.it/a4kbsrv837s81.png?width=1080&format=png&auto=webp&s=6a19347b708d822d8dca226beb82cbdc73ffbb87 https://preview.redd.it/s9nghsv837s81.png?width=1080&format=png&auto=webp&s=cf32712b9985564b455be53e02fd00589725ad2c submitted by /u/avexpo22 [link] [comments]  ( 1 min )
    Resources about cognitive theories
    Hi! I am new to the community, and was wondering what y'all's favorite resources were to learn about cognitive theories and how they will shape future AI advancements. YouTube channels would be great. submitted by /u/Apprehensive-Candy97 [link] [comments]
    Andrew Yang & Yuval Noah Harari: Tech, Public Policy & the Future of Work
    submitted by /u/john133435 [link] [comments]
    AI News | AI News | Why AI Made 40,000 New Chemical Weapons Compounds in 6 Hours | Cancer Treatment AI Breakthrough
    submitted by /u/getrich_or_diemining [link] [comments]
    How to create a BOT for a existing game?
    I wanna create a bot for a game which basically is: get resources, craft itens, sell then. The problem is, some itens has different qualities, and I wanna automatize this process, to identify the good stuff to keep, and sell the bad stuff. What's the best way to do that? I work with desktop systems, so i'm not familiar with this kind of stuff, but I usually read about python and some frameworks, what do you guys recommend me to start? submitted by /u/AbbathDoom [link] [comments]  ( 1 min )
    DALL·E 2: A new AI system to create realistic images and art from natural language commands
    submitted by /u/alien128 [link] [comments]
    What are other "technology" fields that is good to learn while studying AI?
    Hello! What do you guys think are other "technology" fields that would be good to study with AI? It is okay as long as it is "tech." What would be the tech field that would be beneficial in the future? My goal is to make a self-aware AI (AGI). I was always fascinated about AI since my childhood, that's why I'm going to pursue this field. Also, I am currently studying Game Development to make a VR Game that hopefully will have humanlike AI in it. I have read a LOT of articles about the future of AI, and Cybersecurity keeps popping up because superintelligent AI needs to be CONTROLLED from hackers (based on the articles) otherwise it is over. What do you guys think would be the tech field that will bring the most changes in the future? submitted by /u/ThatOneEpicAstronaut [link] [comments]  ( 1 min )
    Does this artificial intelligence think like a human?
    submitted by /u/qptbook [link] [comments]
    Artificial intelligence Courses for Healthcare
    We keep on hearing about how artificial intelligence and machine learning is going to revolutionise Medicine. But what’s hype, and what’s realistic? And how can you get involved? The first step is to understand the technology - where it’s well-suited to healthcare (and where it isn’t). When it comes to health care, especially for life and death situations AI has made things very easy for us. However, it is still expected to drastically change the way medicine is practised. It will also replace the surgeries done by the doctors with the surgeries done using Artificial intelligence, making diagnosing complex diseases, genetic issues and many other health problems extremely easy in the future. Here are the best Artificial Intelligence courses for healthcare you can learn in 2022. submitted by /u/maneesh123456 [link] [comments]  ( 1 min )
    Artificial Nightmares: Smithing Stone 6 || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Introducing MindSpore 1.6 New Features
    submitted by /u/Creative_Habit_6868 [link] [comments]
    OpenAI's DALL·E 2 ! Text-to-Image Generation Explained
    submitted by /u/OnlyProggingForFun [link] [comments]
    How do I get into the field of AI policy and strategy?
    I've read online that a career in AI policy and strategy is heavily needed and is actually ranked as the number one problem in the future by 80,000 hours. I am choosing which undergraduate degree to pursue in the fall and I'm not sure the best pathway to pursue to work in this field in an extremely high level position. an economics degree? Computer science degree? AI degree? should I pursue one subject until I get a PhD in it or mix with other degrees/certificates? is it a straight forward pathway focused on one subject where I only work in one subject field or is it necessary to pursue and work in other fields as well, what are the typical steps? Also if there is anything else that would be helpful on the pathway or anything you would recommend please let me know. submitted by /u/Key-Lawyer-7586 [link] [comments]  ( 2 min )
    Five Google Chrome Extensions that every Machine Learning / Data Science professional should know about 🚀💯
    submitted by /u/MLtinkerer [link] [comments]  ( 1 min )
  • Open

    Implementation of RL
    Hi all! I am a beginner in RL field and am trying to implement the RL algorithm in the following paper : [1912.04321] Learning to Code: Coded Caching via Deep Reinforcement Learning (arxiv.org) In short, we are trying to achieve the minimum number of transmission of bits from the server to all users Now, after 500 episodes of training the number of transmissions does decrease. But when I implement the same actor critic algorithm this does not happen. In fact, the results seem to be completely random. Here is the plot of the same : https://preview.redd.it/yve3cay7l6s81.png?width=745&format=png&auto=webp&s=36bf4f0aa813a30024128d797acde3b8adc2df30 Although my training parameters are slightly different, I can't understand why this would happen. I used the parameters and pseudo code from this paper: A Deep Reinforcement Learning Approach for Shared Caching | IEEE Conference Publication | IEEE Xplore - which is an extension of the link at the top. ​ Attaching link to my code: https://www.kaggle.com/samarthtiwari123/rl-for-coded-caching Any help would be helpful!! Thanks in advance submitted by /u/samt_123 [link] [comments]  ( 1 min )
    How can I extract the direction a specific agent is facing?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Flow of gradients through multiple classes
    Naive question: if I define my RL model as a combination of different classes (one class that preprocesses the observation, one class that processes the observation, one class that outputs the actions, etc.), is this going to affect the flow of gradients in PyTorch? The alternative would be to create only one class in which I combine everything submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Can KL divergence be used as a metric to see the learning progress in PPO?
    In the hyper parameter section of the paper, it is written that step size of Adam is varied according to KL divergence. So I wanted to know is KL divergence the correct metric to be used for observing the learning progress because we have many states for which probabilites of a particular action is either increased or decreased thus taking average KL mixes up a lot of things. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Weight decay in policy network for Discrete SAC?
    We’re finding that our network is returning a tensor of NaNs towards the end of training. Adding weight decay solves this issue but reduces learning, was wondering if anyone else had experience with vanishing gradients in off-policy methods or any insight? submitted by /u/TerrificJam [link] [comments]  ( 1 min )
  • Open

    [D] self attention visualization
    Has anyone ever come across seemingly chaotic self attention maps during visualization. If your model is performing well but no insights can be gleaned from the visualization how do you explain it in a paper? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] PaLM's (Google's 530B LLM) training costs around $9M to $17M.
    Here's the blogpost estimating the cost. What would it cost you to train PaLM using cloud computing (and you're not Google)? Something around $9M to $17M. submitted by /u/cirqe [link] [comments]  ( 1 min )
    [D] Feature selection methods
    I'm working on a ML project and I'm working with a dataset with 20 columns, for feature selection I just removed one column one by one and looked at the error of the ML outputs for each, then saw when what column is removed gives a lower error and kept repeating that but that didn't seem to help the model at all and the error went down very little. Is this an okay way of doing feature selection is there another way that gives better results. I tried PCA and LDA and Pearson Correlation method as well in Python and that didn't seem to help or is this the best I could do. Thanks! submitted by /u/ihshosv [link] [comments]  ( 1 min )
    [D] Overfitting a sign high learning capacity?
    This is a two part question: If a neural network can overfit the a large dataset is this a sign that a neural network has high learning capacity? If a neural network can overfit a dataset with substantially less parameters than other neural networks developed for the same learning task is this a sign that the neural network has a high learning capacity relative to other datasets? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [D] training cnn with synthetic data. Should i mix synth and real and train from the scratch or pretrain the network with synth and finetune with real?
    I'm doing research on the use of synthetic data for a computer vision task and i have generally always tried to train in a mixed setting from scratch, but i have noticed that in similar papers, researchers always pretrain on synth first and then finetune on real data. Is there a logic behind that? Should i expect better results by finetuning? submitted by /u/TheManveru [link] [comments]  ( 2 min )
    [R] Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
    Hi Everyone, We have recently published the code for our AISTATS 2022 paper - Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data ​ Video Segmentation Example In our work, we have proposed a solution for clustering streaming data. Unlike 'standard' clustering scenarios, in the streaming case the data stream is possibly infinite, you cannot backtrack to previously processed points, and the data statistics are dynamic and change over time. Our solution is based on the Dirichlet Process Mixture Model (DPMM), can work with different types of observations, and is very fast, outperforming other methods both in the quality of the results and the speed with which it achieves them. It can even be distributed across several processes and/or machines! Paper: https://dinarior.github.io/papers/Dinari_AISTATS_streaming.pdf Code (Julia Package): https://github.com/BGU-CS-VIL/DPMMSubClustersStreaming.jl Code (Python wrapper): https://github.com/BGU-CS-VIL/dpmmpythonStreaming Notebook (Julia) for creating the video: https://nbviewer.org/github/BGU-CS-VIL/DPMMSubClustersStreaming.jl/blob/main/examples/VideoSeg.ipynb submitted by /u/dinarior [link] [comments]  ( 1 min )
    [D] Best way to handle encoding disconnected graphs at the graph level.
    I am thinking of building a graph classifier that takes in graphs and labels the incoming graph. The dataset of interest to me is RadGraph: https://arxiv.org/abs/2106.14463 The issue I am having is that the graphs in RadGraph are disconnected in nature (on average 20 disconnected components), making it difficult for the various graph encoders I am aware of to do a good job classifying the graphs. submitted by /u/AICoderGamer [link] [comments]  ( 1 min )
    [D] Does someone know how much faster deepspeed's transformer implementation is?
    Implementation here Looks like they manually calculate the gradient? I'm very curious how much of a difference this makes! submitted by /u/fasttosmile [link] [comments]  ( 1 min )
    [R] My research group is publicly sharing its paper presentations! Check it out!
    https://outsystems-ai-reading-group.github.io/ submitted by /u/JClub [link] [comments]
    [R] On-the-fly Strategy Adaptation for ad-hoc Agent Coordination
    submitted by /u/hardmaru [link] [comments]
    [D] Any good free to use DALL-E style datasets?
    Are there any free to use datasets that contain image/annotation pairs in the style OpenAI used to train the DALL-E models? Pretty inspired by DALL-E 2 and think it would be cool to create a tiny less powerful replication submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [D] TensorFlow tf.range() vs range()
    TLDR: TensorFlow AutoGraph unwraps native Python ranges, baking each value into the graph. This can be an unexpected cause of graph size explosion. This recently caused an issue in my project, so I thought I'd share some more details: https://lukewood.xyz/blog/to-unroll-or-to-not-unroll submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [R] A benchmarking framework for time-series unsupervised domain adaptation
    Our work "AdaTime: A Systematic Evaluation of Domain Adaptation Algorithms on Time Series Data" is now public. We provide a benchmarking framework named "AdaTime" to fairly evaluate Unsupervised domain adaptation (UDA) approaches on time-series data. We find that UDA approaches proposed for visual data can be directly applied to time-series data, and still achieve excellent performance, even better than methods specially proposed for time-series UDA. Se were impressed by the consistently superior performance of "DIRT-T" method on all the datasets. We provide the code publicly on github https://github.com/emadeldeen24/AdaTime submitted by /u/emad_eldeen [link] [comments]  ( 1 min )
    [P][R] Announcing: Dataset & Denoising Shabby Pages Competition
    Into machine learning? Want a chance to earn a new MacBook Pro? Check out the Denoising ShabbyPages competition! The ShabbyPages dataset is being produced as a way to help train, test, and calibrate computer vision machine learning algorithms designed for working with documents. Enter the competition by training a model to remove the noise, and be awarded a MacBook Pro or some swag in the process! Check out the short paper introducing the dataset, and learn more about the competition at denoising-shabby.com. submitted by /u/proofconstruct [link] [comments]  ( 1 min )
    [R] FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes
    Hi Everyone! We have released the preprint and google colab demo for our paper FaceSigns. FaceSigns embeds a secret bit-string as a semi-fragile watermark in the image pixels. The message is recoverable if benign image operations such as color/contrast adjustment, JPEG compression, Instagram filters are applied. However, the message cannot be decoded if the image is facially tampered (eg. DeepFake manipulation) . This selective fragility allows reliable detection of DeepFake manipulations applied on images signed using FaceSigns. Try out our google colab demo to see message encoding and decoding using FaceSigns! Paper: https://arxiv.org/abs/2204.01960 Project Webpage: https://shehzeen.github.io/facesigns Demo: https://github.com/paarthneekhara/FaceSignsDemo submitted by /u/LynxCompetitive7637 [link] [comments]  ( 1 min )
    [D] Machine learning models / ideas for Google search ads?
    Hi guys! I work in house and I’m part of our Google search team. Our ad spend is pretty large (9 figures per year, in USD). We build/manage stuff at scale using SQL, R, Javascript, and so on. So everything is pretty much “big data” in flavour. Lately I’ve been more and more interested in data science, and I’m looking to take things to the next level by incorporating machine learning into our workflow. I’d really love to build some useful machine learning models using popular Python libraries such as Pandas, SciKit Learn, NumPy, TensorFlow, PyTorch, and so on. Any suggestions on cool, and most importantly useful machine learning models I could build? (By “useful”, I mean something that could help increase the profits.) I think some classification, predictive, or recommender models would be great to start with. Cheers! 😄 submitted by /u/TropicalBound111 [link] [comments]  ( 2 min )
  • Open

    VDTTS: Visually-Driven Text-To-Speech
    Posted by Tal Remez, Software Engineer, Google Research and Micheal Hassid, Software Engineer Intern, Google Research Recent years have seen a tremendous increase in the creation and serving of video content to users across the world in a variety of languages and over numerous platforms. The process of creating high quality content can include several stages from video capturing and captioning to video and audio editing. In some cases dialogue is re-recorded (referred to as dialog replacement, post-sync or dubbing) in a studio in order to achieve high quality and replace original audio that might have been recorded in noisy conditions. However, the dialog replacement process can be difficult and tedious because the newly recorded audio needs to be well synced with the video, requiring …  ( 7 min )
  • Open

    Data Observability: Cracking the Code
    ‍What is the shortest distance between two points? A straight line of course. What if there are multiple points? Then, it depends.  A job executed in response to a user action – refreshing a dashboard, aggregating data, building a report, developing an ML algorithm, performing analytics – all require multiple hops through the data ecosystem.… Read More »Data Observability: Cracking the Code The post Data Observability: Cracking the Code appeared first on Data Science Central.  ( 5 min )
  • Open

    NN from Scratch: #2 Initializing parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    Regular expressions and successive approximation
    Regular expressions can do a lot of tasks in practice that they cannot do in theory. That’s because a particular application of regular expressions comes with context and with error tolerance. For example, much has been said about how regular expressions cannot parse HTML. This is strictly true, but it says nothing about how well […] Regular expressions and successive approximation first appeared on John D. Cook.  ( 3 min )
  • Open

    Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW
    GeForce NOW is about bringing new experiences to gamers. This GFN Thursday introduces game demos to GeForce NOW. Members can now try out some of the hit games streaming on the service before purchasing the full PC version — including some finalists from the 2021 Epic MegaJam. Plus, look for six games ready to stream Read article > The post Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Question about Model Predictive Control (MPC) cost function
    To my understanding, the cost function is the error between predicted state value and real state value. So if I use a neural network as my dynamics model(unknown true dynamics), the MPC cost function is equivalent to NN’s loss function? submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
    Learning To Play "Settlers of Catan" With Deep RL - code and write-up
    submitted by /u/henrythepaw [link] [comments]  ( 1 min )
    Which environments do you use for benchmarking?
    Hey guys I'm curious which environments you use to benchmark your standard RL algorithms. I typically use some environments from the OpenAI Gym or the DM control suite but benchmarking all my implementations against all environments for multiple seeds would take forever. Are there some of their environments you particularly like for benchmarking? submitted by /u/NiconiusX [link] [comments]  ( 1 min )
    How does advantage estimation is done when episodes are of variable length in PPO?
    In the PPO paper it is stated that we have to collect trajectories of length T from N different workers. Suppose I am not using multiple workers then I have to collect episodes N times of fixed length T. But these episode lengths are variable i.e. some episodes end much before T and some much after T. So my question is how do we calculated advantage because according to the PPO paper, for generalized advantage estimate, we have to observe the reward of terminal state. ​ So how should I calculate GAE in this ? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    A silly question from a new beginner
    I could not find an answer to a question hanging around in my head for a while. Suppose we have some data, if we build up an MDP to capture actions + state dynamics. Would the optimal policy win state-of-art RL algorithms? Edit: If that is the case, why would the community bothers with learning algorithm since finding the model of dynamics is the key? ​ submitted by /u/musicinthedark [link] [comments]  ( 2 min )
    What does it look like the output of a reinforcement learning agent/algorithm in practice?
    Hello, I am relatively new in the area of machine learning/reinforcement learning. I have this basic question regarding practical implementations. I just want to know, what does it look like the output of a reinforcement learning agent/algorithm in practice? Is it like a 'look-up table' that will set the weights/parameters of the ML model based on the input data? Note that I am asking after the offline training of the agent. How to implement the trained agent in practice, like in an embedded system? Do you guys have references or clues to help me to clarify? BR submitted by /u/b0bzera [link] [comments]  ( 1 min )
    multi-discrete action space in SAC(Soft Actor-Critic)
    Hello! I am using SAC(Soft Actor-Critic) to complete a reinforcement learning task with only four steps, each action is from one of four different action spaces. These four action spaces are essentially the same, and they are all chemical compound. I just want the agent to take different types of compounds at each step. I have the following questions: Whether the different four steps can be trained? In fact, there is a paper that only has four steps in a reinforcement learning process, but his action space only has one discrete action space. Is there any article I can learn from? because I know that for the handle control of the game, there are usually multiple discrete action spaces, but each discrete space dimension of my task is larger, such as [800, 700, 500, 600] Thanks! submitted by /u/RangerWYR [link] [comments]  ( 1 min )
    Is multi bandit on policy or off policy?
    quick question submitted by /u/Asleep_Donut1382 [link] [comments]
    How wrong is it to use sampling at inference time ?
    At my company we use RL to solve our problem. The thing is : our problem is rather complex, and this is the core of our product (so clients rely on the results produced). In order to reach satisfying results despite an agent that doesn't learn very well, we use sampling at inference time : instead of taking the best trajectory according to the agent, we take X trajectories and keep only the one with the best reward. This seems completely fine at first (similar things are done in NLP for example, with beam search), but in our case the sampling size is huge : 1024. Usually when using beam search, we use maybe a beam size of 6. Maybe 10 if you have good hardware ? ​ Now, the agent seems to be learning : the mean return is slightly increasing over time, the entropies for the actions are steadily decreasing, etc... Now the goal of the ML team is to improve agent's learning to decrease the sampling size at inference time (because it's costly to run 1024 trajectories through the environment...). But whatever we try, the improvements are not reflected (we compare all our experiments with 1024 sampling in order to see what the customers will see). ​ IMO this is because our sampling size is way too huge, even a random agent can produce okay-ish results... Is my intuition the right one ? submitted by /u/dummy-gummy [link] [comments]  ( 2 min )
    Does a master's thesis/doctoral dissertation need to have implications down the line for it to be good?
    I'm in the last semester of my undergraduate degree; over the past couple of weeks, I've been trying to brainstorm ideas that I would like to pursue in my graduate research career. I'm interested in the emergence of language in multi-agent reinforcement environments but I can't see how this would be important down the line when there are large language models that are completely dominating language and communication. Should this stop me from pursuing this idea or should I let my interest in the idea take precedence? submitted by /u/clarky103 [link] [comments]  ( 1 min )
    How long would it take you to implement a MARL PPO agent with joint attention architecture?
    Out of curiosity, how long would it take to implement a paper like this one? https://arxiv.org/abs/2104.07750 It has PPO agents in MARL, all of them with multihead attention performed on the observation, in such a way that an attention map is created for each agent. This attention map has information about how strongly each agent is attending to various elements of the environment. With KL divergence, the agents are rewarded for minimizing the difference between their attention maps. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
  • Open

    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions at Doordash
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tunning. Please check out the new article for all the technical details and let us know your feedback on our approach. https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] ICML author response. What reviewers expect.
    Hi, we submitted to ICML for the first time. We got 4 reviews and 3 of them are mostly positive. Major comments by the reviewers include: more justification on the assumptions, discussion on choices of parameters, and experiments in more complex and different environments. We want to address all the major and minor comments as best as we can but given that the response is limited to one page we cannot explain everything in detail. I am not sure what is the acceptable norm here. Do reviewers expect the authors to conduct some experiments during the rebuttal and provide sample results or just explain what additional experiment we will conduct and how we will do it. Justification and reasoning should be in details or a brief explanation with an assurance to add a detailed discussion in the final version suffices. TIA submitted by /u/srvsinha186 [link] [comments]  ( 1 min )
    [D] Questions for a TPM for ML interview at Google
    Hey all, I have a technical program manager interview soon for an ML team at google and I want to know if anyone has any sample role-related questions I can gauge myself with. I have a strong data science & statistics background but that doesn't always translate to deep ML knowledge like an ML Engineer might have. Any resources or sample questions? I have not found adequate results from google regarding this team area specifically. submitted by /u/math_is_my_religion [link] [comments]  ( 1 min )
    [D] Anyone knows any high accuracy models on UCI adult dataset?
    Hi everyone. This is my first-time post here, and I hope I did not break any sub rules. Currently, I am doing some research with the UCI Adult dataset(https://archive.ics.uci.edu/ml/datasets/adult). This first step is to build a high-accuracy classifier model. Does anyone know any high accuracy model on this dataset (more than 90%)? I use many machine learning models like logistic regression and neural network. But no matter how complex the model is, I can only get an accuracy of about 85% on the test set. I tried to google but I found many others also have similar results of about 85%. Any posts or papers will be helpful! Thanks in advance for your help! submitted by /u/Akasakura888 [link] [comments]  ( 1 min )
    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tuning. Please check out the new article for all the technical details and let us know your feedback on our approach. ​ https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    Hey there, just a heads up we at The Gradient just published a new article discussing explainability - "This article uses the common backdrop of competitive games to explore the ways in which domain experts adapt to new technologies that lack explainability. I illustrate how interpretations vary based on user experience and model architecture, and how special care must be taken when adapting models to human-centric problems." Check it out here if you think it's interesting / worth discussing: Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [Project] Learning to Play "Settlers of Catan" With Deep RL - Writeup and Code
    Hi all, I just wanted to share a project I've been working on for the past year - using deep RL to learn to play the board game Settlers of Catan. I expect everyone is aware of the results that DeepMind/OpenAI have got recently on Go, DOTA 2, Starcraft 2 etc, but I was motivated to see how much progress could be made with existing RL techniques on a reasonably complex game - but with access to significantly less computational resources. Whilst I didn't end up with an agent that performs at a super-human level, there was clear learning progress and the results were quite interesting. I decided to do a full write-up of the project here, which I figured could be useful for anyone else who is interested in trying to apply DRL to a new, complicated environment. I also open-sourced all the code here for anyone interested. If anyone has any feedback or any questions at all that'd be great! submitted by /u/henrythepaw [link] [comments]  ( 3 min )
    [D] ICML rebuttals optional or semi-mandatory?
    Hi, We just submitted to ICML 2022 and got our reviews back. We were excited to see that 4/4 reviews were positive and acknowledged the contribution of the paper. However, there were some minor criticisms (e.g. didn't do good enough lit reviews, could use a few more experiments) across several reviews. I was wondering if it is ever acceptable to not submit a rebuttal? Can a rebuttal in this case actually hurt us by rocking the boat---or for ICML is the norm that you should always submit a rebuttal that addresses all the reviewers' criticisms. We were wondering what the norm is for ICML specifically? submitted by /u/optimistic313 [link] [comments]  ( 1 min )
    [R] Hierarchical Text-Conditional Image Generation with CLIP Latents. This is the paper for OpenAI's DALL-E 2
    Blog post. Paper (pdf file format). The paper is also linked to in the above blog post. Abstract Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. OpenAI's Sam Altman used DALL-E 2 to generate ~20 text prompt requests from Twitter users. The results are here, with individual result links and other samples in this comment from another Reddit user in a different post. Twitter thread about the paper (not from the paper authors). Sam Altman's blog post about DALL-E 2. Hopefully this summer, we’ll do a product launch and people will be able to use it for all sorts of things. submitted by /u/Wiskkey [link] [comments]  ( 3 min )
    Is the 'first boss attempt' phenomenon know to occur amongst NN playing games, or is this learning trajectory unique to human players?[D]
    I'm curious about wether this unusual learning trajectory observed in humans has also been observed in artificial neural nets. A well known phenomenon in the 'dark souls' video game series is that ones first attempt at a boss is often much better than subsequent attempts. Boss hp at time of death by attempt might go something like: 35%, 55%, 85%, 87%, 75%, 54%, , 60%, 43%, 27%, 38%, 12%, 0%. This sounds very anecdotal, but its know to the community of these games to be a real thing. See this thread for evidence. Have NN playing games been known to exhibit a similar pattern, with peak in success early on, followed by a step descent , then a slow gradual climb? Or is this a purely human phenomenon? My hypothesis as to why this happens is that over the course of the first couple attempts, the player learns a bunch of bad strategies which must be slowly unlearned, whereas on attempt one, the player has no defined strategies good or bad. submitted by /u/Greenface1998 [link] [comments]  ( 3 min )
    [R] Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning
    submitted by /u/papajan18 [link] [comments]
    [Project][P] Who invented Graph Neural Networks?
    Just a side project (only for me) in which I try to sum up some history of DL. Can't be 100% sure this is the first article in which they appear: Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80. Would appreciate any help. Thanks submitted by /u/Siddh__ [link] [comments]  ( 2 min )
    [R] GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes
    (not my paper) paper: https://arxiv.org/abs/2204.02112 abstract: "The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of a covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. We propose Gaussian processes Bayesian additive regression trees (GP-BART) as an extension of BART which assumes Gaussian process (GP) priors for the predictions of each terminal node among all trees. We illustrate our model on simulated and real data and compare its performance to traditional modelling approaches, outperforming them in many scenarios. An implementation of our method is available in the \textsf{R} package \texttt{rGPBART} available at: https://github.com/MateusMaiaDS/gpbart." submitted by /u/bikeskata [link] [comments]  ( 2 min )
    [D] Anyone know about any interesting recent improvements with SNNs?
    I’m currently writing a research paper for my MSc on neuromorphic sensing and spike neural networks and most good papers are from around 2015 and was looking for something more recent. Anyone here heard of any interesting upgrades in architecture or applications? Cheers! submitted by /u/GandhisLittleHelper [link] [comments]
    Noticing that profs focus on male student’s goals and female student’s capabilities, any weigh-in? [D]
    Hello, I’m currently a graduate student. I do different projects and for some I get to decide on what I want the scope to be. I do have to get the scope/ plan/ idea approved first. I pitch my ideas to profs who aren’t directly my profs and normally 5-6 other students will pitch ideas to the same group of profs at the same time….. I noticed that i get really different questions and feedback in comparison to my peers. I’m a female and my peers are male… I didn’t start out with this outlook but I’m starting to search for reasons why I often get questioned about my capabilities to preform a project ( which is normal enough but I get questioned to the point where explaining my approach isn’t enough and they ask me for examples of codes) and my peers definitely do not get asked about there capabilities, rather they tell them what they can do and they don’t get questioned. …………… really frustrating. submitted by /u/tyger-lily [link] [comments]  ( 4 min )
    [D] How to write a ML+Healthcare paper where the research was a framework with pre-trained models
    As a project in the course of my PhD, I had to create a prototype for a project. My PhD is application of machine learning in health care. The project definition and scope was faaaar too wide. However, I managed to create a working demo which encompasses some use cases of the project. At best, it can be called a framework, where I have put in different DL components and it works okay for those use cases only. Most of the components, I have used are pre-trained language models (maybe fine tuned them to my use case). However, there is no active training or learning involved. This is because I created this for a demo only. I also created a very small dataset and tested the framework over the dataset and the results were ok. However, my supervisor now wants me to write a paper, as he is confident, that the use case is rather unique and my working framework is a good first step. I believe, his aim is to get me started on the paper writing process, which I appreciate. However, I am not confident about it at all. My question is, can a 'framework' composed of pre-trained models with the end goal of solving a problem in health care is good enough? Are there precedents of any such paper? And if I trust my supervisor's instincts, are there any fancy ways to frame the framework so that it does not look so basic? submitted by /u/Complex_State9960 [link] [comments]  ( 2 min )
    [P] Building a knowledge based recommender system
    I am trying to build a knowledge based recommender system but do not have prior knowledge. We first take in user inputs such as occasion, weather, top wear and bottom wear, color. Based on this we want to create a knowledge base and recommend clothes. Can anyone help me on how to go about on doing this process step by step and what algorithms and technology to be used? submitted by /u/bills70 [link] [comments]  ( 1 min )
    [D] ICML 2022 Paper Reviews
    ICML 2022 paper reviews are supposed to be released soon. Creating a discussion thread for this year's reviews. submitted by /u/zy415 [link] [comments]  ( 2 min )
    [D] In general, should you let the model find interactions between many basic features, or should you use feature engineering to ‘help’ the model find the interaction?
    I’ll give an example to better explain my question (don’t get hung up on the numbers, it’s all made up). Say you are using a tree based model trying to project how many points a player will score in a given basketball game. Most players shoot free throws at a slightly lower percentage on the road, than they do at home. However, the magnitude varies player to player. Let’s assume for 95% of players with significant data, the ratio of home free throw percentage to away is 1 to 1.15. Generally speaking, older players are closer to 1 and younger players are around 1.1 (since older players get used to the opposing crowd). Now also say it takes 100 home and 100 away free throws to get a stable reliable ratio. Now say a young player only has 50 home, and 50 away free throws. With this amount of data he has a ratio of 1, however the sample size is not enough to be fully stable. Which would be better… 5 features into this model, his home away ratio, average ratio for players his age, home free throw count, and away free throw attempts. 1 feature. His ‘projected’ home away ratio, which is a weighted average of his ratio with the average for plaeyrs his age. Since he’s 50% of the way to significance, 0.5 * 1 + 0.5 * 1.1 = 1.05 The benefit of the of the first choice is that it may find other interactions that I never conceived of, however, it could incorporate noise. Is there a general consensus, or is this just a try both and see what works? submitted by /u/irndk10 [link] [comments]  ( 4 min )
  • Open

    Quick Little Keras Question
    I tried posting this on stackoverflow with no response. Im trying to use model.save() and keras.models.load_model() on this chunk of code. But, unlike some of the other keras examples I've played with, this one seems to crash. I'm super new to this, any Ideas why? I can post the error message if it helps. submitted by /u/HoneyBunchsOGoats [link] [comments]  ( 1 min )
    Driving a robot with a neural network - use case study
    submitted by /u/KamilBugnoKrk [link] [comments]
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
  • Open

    Weekly China AI News: Slime Robot Grabs Swallowed Objects; SenseTime Revenue Grows Despite $2.7B Net Loss; Transformer Architecture Search Without Training
    submitted by /u/trcytony [link] [comments]
    Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    submitted by /u/regalalgorithm [link] [comments]
    How do we know that A.I hasn't already taken over our worlds ? How do we know this isn't the matrix ? #simulation
    submitted by /u/Individual-Fly-610 [link] [comments]  ( 2 min )
    DALL·E 2
    submitted by /u/roblox22y [link] [comments]
    Learn how GANs work with a cool Toonify example!
    submitted by /u/OnlyProggingForFun [link] [comments]
    Artificial Intelligence, Machine Learning and the Higgs boson - Live talk with Dr. David Rousseau
    submitted by /u/aair_x [link] [comments]
    What are your thoughts about AI teachers?
    submitted by /u/curiosityVeil [link] [comments]  ( 1 min )
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Beauty Parlor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives
    Meet the electric vehicle that’s quick-witted and fully outfitted. Last week, NIO began deliveries of its highly anticipated ET7 fully electric vehicle, in Hefei, China. The full-size luxury sedan is the first production vehicle built on the NIO Adam supercomputer, powered by four NVIDIA DRIVE Orin systems-on-a-chip (SoCs). The production launch of its flagship sedan Read article > The post Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives appeared first on NVIDIA Blog.  ( 2 min )
    NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests
    In its debut in the industry MLPerf benchmarks, NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge. Overall, NVIDIA with its partners continued to show the highest performance and broadest ecosystem for running all machine-learning workloads and scenarios Read article > The post NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Receive notifications for image analysis with Amazon Rekognition Custom Labels and analyze predictions
    Amazon Rekognition Custom Labels is a fully managed computer vision service that allows developers to build custom models to classify and identify objects in images that are specific and unique to your business. Rekognition Custom Labels doesn’t require you to have any prior computer vision expertise. You can get started by simply uploading tens of […]  ( 7 min )
  • Open

    An optimized solution for face recognition
    When artificial intelligence is tasked with visually identifying objects and faces, it assigns specific components of its network to face recognition — just like the human brain.  ( 5 min )
    Does this artificial intelligence think like a human?
    A new technique compares the reasoning of a machine-learning model to that of a human, so the user can see patterns in the model’s behavior.  ( 7 min )
  • Open

    DSC Weekly Digest: Moving Time
    For nine years, my family and I have lived in a house in Issaquah, a little community about twenty minutes east of Seattle. The town still retains its charms — a downtown area about three blocks long that includes a vintage (and long since decommissioned) gas station, numerous restaurants, a live theater, the library, and… Read More »DSC Weekly Digest: Moving Time The post DSC Weekly Digest: Moving Time appeared first on Data Science Central.  ( 7 min )
  • Open

    Efficiently Initializing Reinforcement Learning With Prior Policies
    Posted by Ikechukwu Uchendu, AI Resident and Ted Xiao, Software Engineer, Robotics at Google Reinforcement learning (RL) can be used to train a policy to perform a task via trial and error, but a major challenge in RL is learning policies from scratch in environments with hard exploration challenges. For example, consider the setting depicted in the door-binary-v0 environment from the adroit manipulation suite, where an RL agent must control a hand in 3D space to open a door placed in front of it. An RL agent must control a hand in 3D space to open a door placed in front of it. The agent receives a reward signal only when the door is completely open. Since the agent receives no intermediary rewards, it cannot measure how close it is to completing the task, and so must explore …  ( 8 min )
  • Open

    DALL·E 2
    DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.  ( 2 min )
  • Open

    Sum the zeros of an analytic function without finding them first
    A couple days ago I wrote about how Vieta’s formulas let you sum the zeros of a polynomial without having to first compute the zeros. This is especially handy for high-order polynomials since there is no explicit formula for the zeros. Most functions that arise in applications are not polynomials. How could you find the […] Sum the zeros of an analytic function without finding them first first appeared on John D. Cook.  ( 3 min )

  • Open

    Last Week in AI: AI improves algae for biofuel and carbon capture, more AI decision-making in the military, and more!
    submitted by /u/regalalgorithm [link] [comments]
    Researchers From Allen Institute for AI Introduce ‘MERLOT Reserve’: A Novel Multimodal Video Question Answering Model
    We humans navigate the environment using all of our senses. Allen Institute researchers propose MERLOT Reserve, a model that learns to represent videos over time and across several modalities, including audio, subtitles, and video frames. It was trained using a new learning objective and more than 20 million YouTube videos. MERLOT Reserve is a unique, cutting-edge methodology for solving video-related inquiries. MERLOT Reserve can dependably choose the correct answer from a selection of multiple-choice answers when given a video and a question. This forecast is made by MERLOT Reserve jointly reasoning over the visual frames of the video, the video subtitles, and the audio in the movie. Continue reading this cool research update from AI2 Paper: https://arxiv.org/pdf/2201.02639.pdf Demo: https://merlot-reserve.apps.allenai.org/ Project: https://rowanzellers.com/merlotreserve/ Github: https://github.com/rowanz/merlot\_reserve ​ https://preview.redd.it/031i6ty6err81.png?width=1920&format=png&auto=webp&s=299569e12160eb991f35a2c6b41c5758ff027235 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    EndlessVN open alpha today
    submitted by /u/roblox22y [link] [comments]
    AI Meets Quantum Technology in New Google Spinoff, Sandbox AQ - News
    submitted by /u/allaboutcircuits [link] [comments]
    Best undergraduate major besides computer science for pursuing a career in artificial intelligence?
    Hi, all. I got accepted into my top choice of college as an undecided major. Recently, I have decided to pursue artificial intelligence! Unfortunately, it is near impossible to transfer into computer science at my particular university. I was wondering if I can still pursue AI as a career if I complete one of the following majors: -Mathematics -Information or Data Science -Statistics -Linguistics Additionally, I could pursue one of these and minor in another. I should be able to minor in computer science as well if necessary. Hopefully, my choice of major would allow me to pursue research or an internship in artificial intelligence. I am willing to take additional summer courses and pursue relevant certifications to ensure that I am up to par with my computer science colleagues. (posted on behalf of a family member) submitted by /u/runelagoon [link] [comments]  ( 2 min )
    Comparing old and new AI voices from Replica Studios (new in second half)
    submitted by /u/autumns [link] [comments]
    Super-res model/program comparison
    I upscaled an image with a few different superres models and programs, pick your favorite! https://files.botbox.dev/superrestestcollage.png Because of how reddit is, I can't make this as a poll, so comment your pick. Animated original version: https://www.youtube.com/watch?v=zRaTwVuqd70 (I will also make an animated version upscaled with the most voted model/program) submitted by /u/Recent_Coffee_2551 [link] [comments]
    solo voiceovers
    I am looking for something to change my voice in a way that is more satisfactory and more convincingly varied than what simple voice modulation software can achieve and as cheaply as is possible (preferably free). Use case: I have been working on an animated movie to which I am the sole contributor. Though I have been putting it off while looking for an appropriate solution, the time has come to voice my various characters, who are a range of ages, both male and female. For several reasons, I am interested in voicing them all myself while doing the facial motion captures as well. What I am in need of is, essentially, something that does exactly what Respeecher does, but without the $200/month sub fee. I would love to be in a position to simply pay them what they are asking for in exchange…  ( 2 min )
    Artificial Nightmares: Frenzied Flame || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    AI that takes multiple songs as input, and then generates a similar song or song with similar elements?
    I have been searching for a music AI that takes input as mp3 or midi files, yet haven't been successful yet. Is there such a thing? If not, is such a thing feasible? submitted by /u/16pxl [link] [comments]  ( 1 min )
  • Open

    [D] Why aren't new LLMs using the Perceiver architecture?
    Perceiver and PerceiverIO (https://arxiv.org/abs/2107.14795) appear to offer significantly improved FLOP efficiency, but new LLMs (including Deepmind's own Gopher) don't use it. What gives? Is it still too new, or is the Perceiver architecture not appropriate for LLMs? submitted by /u/deeceeo [link] [comments]
    [R] Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021 (Schmidhuber YouTube Talk)
    Saw this posted on Schmidhuber's Twitter: Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. URL of talk: https://youtu.be/2GgGVdkq2bU Abstract The most widely used machine learning algorithms were designed by humans and thus are hindered by our cognitive biases and limitations. Can we also construct meta-learning algorithms that can learn better learning algorithms so that our self-improving AIs have no limits other than those inherited from computability and physics? This question has been a main driver of my research since I wrote a thesis on it in 1987. In the past decade, it has become a driver of many other people's research as well. Here I summarize our work starting in 1994 on meta-reinforcement learning with self-modifying policies in a single lifelong trial, and - since 2003 - mathematically optimal meta-learning through the self-referential Gödel Machine. This talk was previously presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. Many additional publications on meta-learning can be found at https://people.idsia.ch/~juergen/metalearning.html submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] Hyperparameter Tuning: does it even work?
    Hi *, I've been working for the last 5 years as Data Scientist. During this time I have tried dozens of times to improve my models via hyperparameter tuning, but I've never got improvements from there. I've tried all the possible approaches: grid search, random search, bayesian search, etc. But in no case did I get satisfactory results. Does this happen to anyone else? Have you ever got robust improvements via HP tuning? submitted by /u/AM_DS [link] [comments]  ( 1 min )
    [D] Autoregressive model for graph generation?
    Autoregressive models like GPT-2 do fairly well in text generation. Is it possible to do the same for graph data? A transformer based model Graphormer has recently shown its effectiveness in graph representation learning. Is there any way I can train Graphormer or any other model to generate graphs from an initial graph context? submitted by /u/ratt_m [link] [comments]
    [D] How do you guys hear about the latest papers?
    Hi! I'm a first-year Grad Student in Computer Vision and I am trying to get caught up on the latest research in my field. It seems like everyone in CS has heard about all of the latest papers but I just have no idea how. My knowledge is limited to general ideas and doesn't know any specific papers unless they have like 20000+ citations. So my question is: how do you hear about these papers and get caught up? Is there a reference somewhere that puts together a list of all the "must-read" papers that have come out? I feel like I am already 5 years behind in my knowledge. It would be great if there was something like "Top 5 papers of the week" that I could read to stay on top of things. Also, this doesn't just apply to Vision. I would like to have an idea of the other major developments in other fields (like NLP, general ML/DL, etc.) since I think that can carry over to my field. Thanks! Looking forward to your replies submitted by /u/TobusFire [link] [comments]  ( 2 min )
    [D] Fake authors and paper riders
    Based on my experiences in both academia and industry, I see that many researchers get listed as authors on papers solely for having attended the relevant project meetings, despite not contributing anything substantial to the work. I know of several people who've gotten on dozens of papers this way, despite not being able to explain the main details behind many of the papers they "co-authored." Of course, they can then claim credit for the work publicly as well as have their academic profile benefit from the citations accrued by the work. I've noticed that typically, these people are initially invited onto the project because they are on chummy terms with someone on the project. Concerningly, the more someone successfully "paper-rides" this way, the stronger their publication record looks, which makes it easier for them to find their way onto more projects to paper ride in the future. It seems that the obsessive focus on paper counts and citations has encouraged the rise of intellectually dishonest strategies for maximizing one's academic footprint. The huge research scientist salaries at top industry labs, which similarly obsess over paper counts and citations in their hiring process, only amplifies the incentive for paper riding. The reason I think it is bad: As more people paper ride, co-authorship on a paper gradually becomes a worse indication of expertise. Not to mention, paper riders are intellectually dishonest, by claiming credit for research that they didn't significantly contribute to. In a sense, it seems like a roundabout form of plagiarism. I know some might disagree with this take, as some people believe in being as generous about co-authorship as possible. I find that mindset to create the perfect environment for paper riders to flourish. I'm wondering if you've also seen paper riding happen and whether you think this behavior is good or bad. submitted by /u/alwayshumming [link] [comments]  ( 7 min )
    [D][R] Generate random sample for exponentiated Weibull distribution
    Hi there experts, I have a real distribution for which I had run this scipy script to detect the best fit: However, the script outputs 4 parameter values and the best fit is actually a Exponentiated Weibull distribution. Now I am clueless how to generate a sample list of data of n-size. I know for sure about the normal distribution after getting these params as mean and sigma. How to I generate such list. Please help. ​ ​ https://preview.redd.it/79n28icmsqr81.png?width=1141&format=png&auto=webp&s=d9478691c06f5cdfe03af4f82db8293443e91f1e submitted by /u/GoldenDew9 [link] [comments]  ( 1 min )
    [R][D] VAE Embedding Space - Can we force it to learn a metric?
    I understand that certain AE types such as B-VAE disentangle certain aspects of variation in the data, and those such as Conditional AE or VAE allow us to separate these aspects with labels. However, what I have seen is that the embedding space doesn't cluster the images as well as some contrastive methods. However contrastive methods require non-elegant negative sampling etc. Can we somehow force the VAE to learn both the variational lower bound as well as learn a good metric between samples such as visually similar samples are better clustered together? submitted by /u/jim_from_truckistan [link] [comments]  ( 1 min )
    [D] Jetson AGX Orin dev kit as a stand-alone training platform
    The Jetson Orin 64gb model has "275 Sparse|138 Dense INT8 TOPS", and I am a little confused about how to compare this to something like the RTX a6000's performance. I am looking to do deep rl training and am new to the field. What metrics make a difference for deep rl? Any thoughts on the Orin dev kit's ability to train deep rl? submitted by /u/here_to_create [link] [comments]  ( 1 min )
    [D] With the rise of AutoML, what are the important skills for a ML career?
    Some time down the road, when AutoML becomes more established, it can help us determine the best ML model and hyperparameters for a particular problem. This will not replace data scientist, as we still need data scientists for their domain knowledge, which is critical for scoping business problems, pre-processing data, and deriving business insights from the trained model. However, since data scientists no longer need to deal with the technicalities of a model in the near future (i.e. they no longer have to tune hyperparameters, determine the best opitmistion function etc), is there still a need for aspiring data scientists to learn about the intricacies and nuances behind the various models (maybe by coding the model from scratch)? Or is it enough for them to learn how to operate an AutoML system? (My question is referring to the corporate world in general and not to academia) Thanks in advance for your answers :) submitted by /u/smart_oinker [link] [comments]  ( 7 min )
    [P] AutoML-Conf Competition: DAC4AutoML
    Hi everyone! We've just launched a competition at the AutoML-Conf 2022, the DAC4AutoML competition. It has two tracks, one for configuring a Computer Vision model and one for a RL pipeline: https://automl.github.io/dac4automlcomp/ And what is DAC exactly? It means we want to find well-performing hyperparameter configurations like in Algorithm Configuration, but we do it dynamically - thus DAC, Dynamic Algorithm Configuration. As to how that is supposed to happen? We don't put any restrictions on the solutions for the competitions, so you can submit your hand-tuned static hyperparameter setting if you want. Or you can use some sort of heuristic, a regression model, reinforcement learning, ... whatever works. If you're interested in participating, you can submit from now on until the 18.06. AOE, the winners will be announced at the AutoML-Conf. submitted by /u/catsortion [link] [comments]  ( 1 min )
    [D] Imagenet Original Pictures
    As I understood it Imagenet got generated from internet images, but I am unable to to find the originals using naive image search. Is there any mapping? I wonder if imagenet data is a cropped versions of original pictures or not, i don't see it in the paper. submitted by /u/LeanderKu [link] [comments]  ( 1 min )
    [P] UFO Lands on Highway! Or Depth Estimation using ML
    Article describing depth estimation using machine learning models and 3D visualization of depth maps using three.js. https://www.storminthecastle.com/posts/ufos_and_depth/ submitted by /u/CakeStandard3577 [link] [comments]
    [D] Could Stylegan-XL be great for out-of-domain generation?
    In the context of text-to-image generation, I'd say one of the reasons VQGAN is so used in popular notebooks is that it can deal with many concepts, while stylegan used to be limited to the domain it was trained for. That may be about to change with the rollout release of Stylegan-XL weights trained on Imagenet. This notebook (https://github.com/CasualGANPapers/StyleGANXL-CLIP) has had nice results with objects never seen by the model, such as "apple" and "ant", as well as scenes such as "judo athletes fighting" Please note that the Stylegan-XL weights are currently available for 128x128 pixels. ETA for the 256 resolution is 14.04.22 submitted by /u/HrodRuck [link] [comments]  ( 1 min )
    [Discussion] Support Vector Machines... in 2022
    My post is inspired by this discussion. In that thread, OP asked why support vector machines are still taught. People offered several thoughts: they're easier to think about, they're still perfectly good for some real-world problems, and for some problems they apparently rival deep networks. I did a project for a class around six years ago using an SVM as implemented in scikit-learn. I was pretty satisfied with the project, but I also experienced some frustrations, and came away with some questions. I started working with Tensorflow and DNNs in earnest soon after finishing that project, and I largely stopped thinking about SVM. I would like to revive the questions I asked, but never answered, here. A DNN with multiple outputs can potentially use a single neuron in the prediction of more than one output. For multiple, mutually-exclusive categories, this makes good sense. An SVM with multiple outputs in scikit-learn was implemented as pairs of one-vs-one SVMs, each of which was independently fit to data. This gets inefficient quickly. Has this changed? Can it be changed? DNN training at scale is a problem that many people have worked hard to make practical. Even non-experts like myself use our home GPUs to accelerate training of DNNs on large data sets. In scikit-learn, SVM training was implemented in a single thread on one CPU core. If you are performing cross-validation or a hyperparameter optimization study, it might be practical to parallelize fitting; one thread for each distinct condition. But can you parallelize the SVM fitting algorithm for a single condition? I went looking for software, but I couldn't find anything. Over to you folks. Cheers. submitted by /u/aotus_trivirgatus [link] [comments]  ( 6 min )
    [R] Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022--ORAL) + Colab Demo + Gradio Web Demo
    ​ Visual Results With Restormer, you can remove noise, motion blur, defocus blur, and rain streaks from your own images. Paper: https://arxiv.org/abs/2111.09881 Github: https://github.com/swz30/Restormer Colab Demo: https://colab.research.google.com/drive/1C2818h7KnjNv4R1sabe14_AYL7lWhmu6?usp=sharing Gradio Web Demo: https://huggingface.co/spaces/swzamir/Restormer submitted by /u/swz30 [link] [comments]  ( 1 min )
    [R] [D] Seq2seq model hyperparameters tuning
    Does anyone have any advices or research papers on what hyperparameters do researchers use for their seq2seq model? I am interested in knowing whether hyperparameters such as dropout, or recurrent dropout, batchnorm, etc etc, are even necessary in the usage of seq2seq model, but couldn’t find anything on it for weeks. In the case, let’s say, using gridsearchCV, what hyperparameters do you tweak for ur seq2seq model? (Other than the usual stuff like number of neurons, etc). There is absolutely zero information for that on seq2seq model, and everyone just assumes that putting an attention mechanism solves everything without hyperparameters tunings. I have also looked up on codes on seq2seq, and no hyperparameters tunings were shown whatsoever. FYI, this is in the context of time series data, using seq2seq, if that matters. Thanks submitted by /u/plsendfast [link] [comments]  ( 1 min )
    [D] Has anyone seen any papers related to GANs which prove that the optimum remains unchanged when adding supervised loss (e.g. L1, L2)?
    I’ve been reading many papers lately pertaining to GANs, with more and more introducing supervised loss into the generator’s objective function. However, no one ever seems to show that the optimum remains undisturbed. Results seem to be strictly empirical most of the time. Has anyone seen any papers where it is shown that the disruption to the generator’s loss doesn’t harm convergence? submitted by /u/king_of_walrus [link] [comments]  ( 1 min )
    [D] What is your experience with Fake results or overfitted results being sold as awesome?
    I am curious what is everyones experience with completely faked, falsified, or fabricated results in the area? Another aspect of this I think is people taking heavily overfitted results and finding one decent example that is from the test set and claiming their method is awesome. How much of this have you seen and how much of the research out there fits into this category? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
  • Open

    Win tickets to The AI Summit London 2022
    Sponsored Post Join the UK’s most forward-thinking technologists and business professionals this June in a celebration of emerging technology. Machine […] The post Win tickets to The AI Summit London 2022 appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    MIT has trained AI to generate new molecular materials
    submitted by /u/aidev2040 [link] [comments]
  • Open

    Building a Data Products-centric Business Model
    When I was the Vice President of Advertiser Analytics at Yahoo!, I painfully learned that my targeted user personas (Media Planners & Buyers and Campaign Managers) didn’t want more data in helping them optimize their marketing, campaign, and advertising spend across the Yahoo! Ad Network.  Heck, they didn’t even want analytics!  The aspirations for these… Read More »Building a Data Products-centric Business Model The post Building a Data Products-centric Business Model appeared first on Data Science Central.  ( 5 min )
    Data Discovery for ML Engineers
    Real-world production ML systems consist of two main components: data and code. Data is clearly the leader, and rapidly taking center stage. Data defines the quality of almost any ML-based product, more so than code or any other aspect. In Feature Store as a Foundation for Machine Learning, we have discussed how feature stores are… Read More »Data Discovery for ML Engineers The post Data Discovery for ML Engineers appeared first on Data Science Central.  ( 9 min )
    Content Metrics That Can Help You To Write Dissertations
    Many things can help you to write good dissertations. One of the most important is to use content metrics. It is necessary for all of the students to understand content metrics in detail. A clear understanding of its types and measuring strategies help you to evaluate things in a precise way. Whatever is your topic… Read More »Content Metrics That Can Help You To Write Dissertations The post Content Metrics That Can Help You To Write Dissertations appeared first on Data Science Central.  ( 5 min )
    Comparative analysis of an Intel and AMD Processor
    The need of a highly functional and fast processing Central Processing Unit (CPU) in today’s world is not just mostly desired, but also mostly required due to the rapid digitalization across the globe. Whether you work on a personal computer (PC) unit or laptop, the necessity of a highly advanced processor is indispensable.  This is… Read More »Comparative analysis of an Intel and AMD Processor The post Comparative analysis of an Intel and AMD Processor appeared first on Data Science Central.  ( 3 min )
    AI And Its Impact On Diversity And Inclusion
    How does artificial intelligence Diversity, Equity, and Inclusion (DEI) fit into the technological stack of daily companies? Fostering a diverse workforce is a very human problem. The cry for a halt to race prejudice has become deafening, and it’s increasingly a decisive factor for talent when weighing job offers and purchases. To stay up with the… Read More »AI And Its Impact On Diversity And Inclusion The post AI And Its Impact On Diversity And Inclusion appeared first on Data Science Central.  ( 4 min )
    How Data Intelligence Platforms Promote Business Success
    Understanding consumer behavior is becoming more and more critical as businesses seek to find innovative ways to survive and thrive in a period of constant change. In the last few years, the market has seen significant changes in the way people shop, travel, dine and purchase goods. As a business, when it comes to understanding… Read More »How Data Intelligence Platforms Promote Business Success The post How Data Intelligence Platforms Promote Business Success appeared first on Data Science Central.  ( 4 min )
    Exploring AI labeling for children’s products
    I read an article from the world economic forum which proposed an AI labeling system for AI products designed for children Today, for the first time, children are growing up in a world shaped by artificial intelligence (AI) and decisions are being made for children implicitly by AI.  Algorithms need data that is collected and… Read More »Exploring AI labeling for children’s products The post Exploring AI labeling for children’s products appeared first on Data Science Central.  ( 3 min )
  • Open

    Need project suggestions
    I’ve been running circles in tutorial purgatory and I want to get out of it with sone projects. Anyone has any suggestions? Guided ones would be nice. For unguided ones, could you please provide source links/hints? submitted by /u/HellVollhart [link] [comments]  ( 1 min )
    Agents learns policy when sampling last episode from replay buffer, but don't when randomly sampling from the replay buffer
    Hi all. I've been stuck on this problem for a while and I thought I might be able to find some help here. Any kind of assistance would be greatly appreciated. My setup is as follows. I have an environment with 3 agents. All 3 agents have a single policy network, and it is based on CommNet. My goal is to implement a replay buffer for this environment. I verified that my replay buffer logic is good. I tried running 3 different types of runs: Normal on-policy run: The agents perform an episode, and at the end of each episode the data (such as the states, actions, etc) from this episode are used to calculate the loss Using just the last episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, the last episode is sampled from the replay buffer (which is the episode that was just performed). This is just to confirm that my replay buffer is working properly, and the reward curve for this case matches that from (1). Using 1 random episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, a random episode is sampled from the replay buffer and used to calculate the loss. The performance is terrible in this case, and the environment times out each time For some reason, as soon as I turn on random sampling, progress is really bad. I'm sorry to pose such an open-ended question, but what are some things I could check to pinpoint the source of this problem? What could be a reason as to why performance is as expected when just sampling the last episode, whereas it is terrible when randomly sampling episodes? I've tried some things thus far but nothing has worked, and I turned to this community in hopes of getting some help. I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 3 min )
    PPO sample correlation?
    Hi, I'm wondering if the PPO algorithm can solve the sample correlation problem of on-policy algorithm in training. PPO uses successive samples to compute GAE, doesn't the sample correlation occurring here interfere with learning? submitted by /u/noisemastar [link] [comments]
    [P] AutoML-Conf Competition: DAC4AutoML
    submitted by /u/catsortion [link] [comments]  ( 1 min )
    Temporal Difference Learning for Model Predictive Control
    submitted by /u/bendee983 [link] [comments]
    Why is there no rollout monitoring for this CustomEnv (on the right) ?
    ​ Output from using model.learn(env) on both Envs On the left I have a simple dummy CustomEnv (Using Stable-Baselines3 with Gym) for testing, and on the right I have my actual CustomEnv that I am working on in a project. As you can see, the dummy environment gives me the rollout monitoring, whereas there is no rollout monitoring for the actual environment (just time + train statistics/monitoring). I am using very similar code when setting up the training of the model, however the complexity of the actual model is significantly higher than the dummy. In theory, the complexity of the environment shouldnt make a big difference to the monitoring right? All of the key parts are still there (reward function, step function, reset function etc.). In both cases it says that the environments are being wrapped by the 'Moniter' wrapper so that cant be it. Does anyone know why this might be happening? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Value Iteration in Car Racing V1
    I’m working on Q table learning model for OpenAI’s. I have everything done in regards to a basic agent, but I’m unsure how I’m supposed to use the box data for action space and observance space, to populate a q table? Or is this approach incorrect? Car Racing doesn’t have a P (probability) call so I’m not sure how else I would do value iteration. submitted by /u/Dzartovian94 [link] [comments]  ( 1 min )
    Action Spaces all landing to zero probability in few steps
    Hey guys, I am new to RL and walking through the Keras implementation of Actor Critic. ​ As a variant of it, I am trying to learn the strategy for WORDLE. However, after a few runs, my action spaces all go down to zero. Not sure what's happening. Could someone have any insights or pointers? ​ Attaching my code for reference. ​ Thanks import pandas as pd import numpy as np import random import string import random import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # Configuration parameters for the whole setup gamma = 0.9 # Discount factor for past rewards max_runs = 10000 eps = np.finfo(np.float32).eps.item() # Smallest number such that 1.0 + eps != 1.0 my_file = open("", "r") content = my_file.read() content = lis…  ( 3 min )
    Any RL-related conferences right after NeurIPS 22’?
    In case my NeurIPS submission rejected, lol. submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
  • Open

    Robots dress humans without the full picture
    MIT researchers design a robot that has a trick or two up its sleeve.  ( 6 min )
  • Open

    Reproducibility in Deep Learning and Smooth Activations
    Posted by Gil Shamir and Dong Lin, Research Software Engineers, Google Research Ever queried a recommender system and found that the same search only a few moments later or on a different device yields very different results? This is not uncommon and can be frustrating if a person is looking for something specific. As a designer of such a system, it is also not uncommon for the metrics measured to change from design and testing to deployment, bringing into question the utility of the experimental testing phase. Some level of such irreproducibility can be expected as the world changes and new models are deployed. However, this also happens regularly as requests hit duplicates of the same model or models are being refreshed. Lack of replicability, where researchers are unable to reproduce…  ( 9 min )
  • Open

    Customize the Amazon SageMaker XGBoost algorithm container
    The built-in Amazon SageMaker XGBoost algorithm provides a managed container to run the popular XGBoost machine learning (ML) framework, with added convenience of supporting advanced training or inference features like distributed training, dataset sharding for large-scale datasets, A/B model testing, or multi-model inference endpoints. You can also extend this powerful algorithm to accommodate different requirements. […]  ( 5 min )
    Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger
    Research over the past few years has shown that machine learning (ML) models are vulnerable to adversarial inputs, where an adversary can craft inputs to strategically alter the model’s output (in image classification, speech recognition, or fraud detection). For example, imagine you have deployed a model that identifies your employees based on images of their […]  ( 12 min )
  • Open

    Unreal Engine and NVIDIA: From One Generation to the Next
    Square/Enix presents the fictional city of Midgar in Final Fantasy VII Remake at a filmic level of detail. Epic’s Fortnite bathes its environments in ray-traced sunlight, simulating how light bounces in the real world. And artists at Lucasfilm revolutionized virtual production techniques in The Mandalorian, using synchronized NVIDIA RTX GPUs to drive pixels on LED Read article > The post Unreal Engine and NVIDIA: From One Generation to the Next appeared first on NVIDIA Blog.  ( 4 min )
    Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year
    A dozen companies today received NVIDIA’s highest award for partners, recognizing their impact on AI education and adoption across such industries as education, federal, healthcare and technology. The winners of the 2021 NPN Americas Partner of the Year Awards have created a profound impact on AI by helping customers meet the demands of recommender systems, Read article > The post Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Bounding zeros of an analytic function
    The previous post looked at the problem of finding the zeros of a cubic polynomial. Assuming we’re going to use a numerical method to calculate the zero, the hard part is knowing where to tell the numerical method to look. That post showed how to use a change of variables to guarantee that the polynomial […] Bounding zeros of an analytic function first appeared on John D. Cook.  ( 2 min )
    Numerically finding roots of a cubic
    The analog of the quadratic formula for cubic equations is cumbersome. A lot of people naturally say “Forget all that. If I need to find the roots of a cubic, I’ll just use a numerical method like Newton’s method.” Sounds good. Where to start? But how do you know where to look for the roots? […] Numerically finding roots of a cubic first appeared on John D. Cook.  ( 3 min )
  • Open

    Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data
    Highlights Extend the VAE framework to work with heterogeneous, i.e. multi-modal, data by projecting all “channels” to a common latent representation; Use variational dropout to learn sparse representation, which are useful to discover in an unsupervised manner the optimal number of dimensions, i.e. the number of ground truth generative factors.  ( 2 min )

  • Open

    Mathematics and piano tuning
    The following is a slightly edited version of a Twitter thread on @AlgebraFact. The lowest C on a piano is called C1 in scientific pitch notation. The C one octave up is C2 and so forth. Middle C is C4. The frequency of Cn is approximately 2n+4 Hz. This would be exact if C0 were […] Mathematics and piano tuning first appeared on John D. Cook.  ( 2 min )
    Computing functions of roots without computing roots
    Once in a while it’s necessary to calculate some function of the roots of a polynomial, and it may be possible to do this without first calculating the roots. Quadratics The quadratic formula gives explicit solutions to the equation The two solutions for x are where The awkward part is taking the square root of […] Computing functions of roots without computing roots first appeared on John D. Cook.  ( 3 min )
    FWHM for a quadratic
    This post contains a derives a result I needed recently. The derivation is simple but a little tedious, so I wanted to save it in case I need it again. Full width half maximum A common way to measure the width of a function peak in a function f(x) is to find the place x0 […] FWHM for a quadratic first appeared on John D. Cook.  ( 2 min )
  • Open

    PPO Alg confusion
    As I read the paper and several tutorials, I am quite confused about the details. I see many implementations scale the running cumulative discounted reward in a certain way. However, each of them does it in a different way. let R be the running cumulative discounted reward, which is considered the best? Or is there a source of the method to use? implementations I saw from different places include: directly use R to calculate the advantage and training value network (most PPO tutorials use this) use (R / std(R)), where std is the mini-batch standard deviation use (R / std(R)), where std is the running standard deviation use ((R - mean(R)) / std(R)), where both mean and std are mini-batch wise use ((R - mean(R)) / std(R)), where both mean and std are running stats. do the above and clip to a certain range ([-10, 10] or [-1, 1]) I also see several different ways for the value network, let V be the output of the value network: output raw logit, without any scaling/output activation (most PPO tutorials use this) output raw logit, but use the same scaling as discussed above for running cumulative discounted reward, for example, if return value is (R / std(R)), value output will be (V / std(R)) do the same as 2, but use the stat of V instead of R for scaling, for example, if the return value is (R / std(R)), the value output will be (V / std(V)) output with tanh activation at the last layer output with tanh activation at the last layer, and multiply by a constant to match the range of the return any help would be appreciated, thanks! submitted by /u/seermer [link] [comments]  ( 2 min )
    I’m completely new to RL and will be building my first model as part of my degree-ending project. Do you have any tips you can provide?
    Hello all, As the title describes, I’ll be making my first model as part of my final project. I still have a pretty high-level understanding of everything, so forgive any inaccuracies as I describe what I’m going for. The problem I’m attempting to solve is known as the traveling salesman problem. Essentially, a route needs to be formulated that stops at n given locations. Finding the most efficient route with many stops algorithmically is impractical because the number of possible routes increases exponentially with each added location. The environment will simulate travel on city roads. Speed will be a constant, set to whatever the roads speed limit is. I am using .pbf format vector GIS data from OSM so that the environment consists of real-world pathways. I’m using GeoPandas and Pyrosm to work with the data, and I’m collecting nodes for the location of gas stations so that the environment can simulate needing to fuel the vehicle. Gas price will be a constant, as well as vehicle fuel-efficiency. Scoring will be based on the calculated time it would take to complete a route and the calculated cost (in gas). The goal will be to find the most efficient route to take when n = some large number. I’ve never worked with spatial data either, so I’m not sure what kind of challenges that poses. I worry that adding nodes for the locations of gas stations might be difficult. I’m also wondering if I’m better off using Tensorflow and Keras for this, but I’m not really aware of all the technical considerations I should be making before deciding on that. Do you have any tips that might help me out? Solutions to problems I haven’t hit just yet, but likely will? Thanks for your help! submitted by /u/professorDissociate [link] [comments]  ( 2 min )
    Best implementations for extensibility?
    As far as I am aware, StableBaselines3 is the gold standard for reliable implementations of most popular / SOTA deep RL methods. However working with them in the past, I don't find them to be the most usable when looking for extensibility (making changes to the provided implementations) due to how the code base is structured in the behind the scenes (inheritance, lots of helper methods & utilities, etc.). For example, if I wish to change some portion of a method's training update with SB3 it would probably involve overloading a class method before initialization, making sure al the untouched portions of the original method are carried over, etc. Could anyone point me in the direction of any implementations that are more workable from the perspective of extensibility? Ideally implementations that are largely self contained to a single class / file, aren't heavily abstracted aware across multiple interfaces, don't rely heavily on utility functions, etc. submitted by /u/Farconion [link] [comments]  ( 1 min )
    Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation in RL?
    Given a NN class, is there something specific we need to care of when converting *args and **kwargs to a canonical kwarg representation? I ask this because in this code from Google (https://github.com/google-research/google-research/blob/c56b47713b08c95ad427d5f93ee0dbb9ad008964/social_rl/multiagent_tfagents/joint_attention/attention_networks.py#L557) they use a TFDecorator-aware replacement for inspect.getcallargs, instead of using getcallargs directly. So my questions are: - Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation? - If no, is there an equivalent in PyTorch? I couldn't find any, so I was wondering how people go about that. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    openAI gym return done==True but not seeing goal is reached
    Hi all, I am running some starter code from openAI(FetchReach-v1, FetchPush-v1) gym with env.action_space.sample(). But I don't see the goal is actually achieved when done returned is True. I copied the code from here (https://openai.com/blog/ingredients-for-robotics-research/). I even let it sleep every step to watch more closely. Another related thing that I can't explain is that it always returns done==True rather quickly with very few sampled actions. These all make me worried about using it as my task environment. submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Which elective: Monte Carlo Simulation or Computational Learning Theory?
    Hello /r/reinforcementlearning. I have to choose electives pretty soon, and as i am interested in reinforcement learning, I wanted to know which of these would be the most beneficial. Monte Carlo Simulation Computational Learning Theory The year after I will also take a course on Reinforcement Learning, but it has not been created yet. Note: I can also take both if recommended, if I do so, I will take one of the courses before taking the RL course, and the other would be at the same time. Some further thought I've had: CLT includes Bandits, which is surely were useful to know, but it seems to be only a rather small part, and I'm unsure whether all the other topics like PAC Learning and Rademacher Bounds are useful? MC is more practical while CLT is more theoretic (Apparently VERY theoretic according to the course description above). I am not afraid of theoretic courses, but I struggle more with them than more practical courses. The sentiment around the MC course, is that it is pretty good. I don't know anyone who have taken the CLT course. If I choose both, which order would you take them in? submitted by /u/John_Hitler [link] [comments]  ( 2 min )
    Ray RL lib observations normalized?
    Hey i am using the RL lib from ray and i don't know if the observations automatically normalized by the lib or not? By creating a costum environment ray wants you to create an observationspace. That would be a gym box in my case. Anyway idk the exact high and low values. My values lay between -1 and 1 more or less. My fear is now that ray would normalize the Observation values to a new range although they are already processed. Does ray normalized observationspace? If yes how can i turn it off? Thanks! submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    [CfP] EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
  • Open

    [D] Why do we still teach support vector machines?
    Honest question: are there any applications for which SVMs are the best choice? In my experience, no one seems to use this methodology anymore, though maybe I'm wrong. It just kinda feels like teaching people how to use a slide rule when everyone has calculators. submitted by /u/WartimeHotTot [link] [comments]  ( 3 min )
    [D] Paper Explained - Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
    https://youtu.be/zEMOX3Di2Tc This paper finds what seems to be a new phenomenon when working in the continual learning/life-long learning domain. If new tasks are continually introduced to an agent, it seems to loose it's ability to learn the more time progresses. Intuitively it's similar to this idea that "an old dog can't learn new tricks". They propose a fairly simple method of overcoming this limitation that involves resetting weights that are not contributing much to the outcome of the network. They call the method Continual Backprop. ​ Outline: 0:00 - Overview 2:00 - Paper Intro 2:53 - Problems & Environments 8:11 - Plasticity Decay Experiments 11:45 - Continual Backprop Explained 15:54 - Continual Backprop Experiments 22:00 - Extra Interesting Experiments 25:34 - Summary ​ Paper link: https://arxiv.org/abs/2108.06325 submitted by /u/SlickBlueML [link] [comments]  ( 1 min )
    [R] Google's 540B (Dense) model Pathways LLM, "Unlocks" new tasks proportional to scale
    Blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Paper: https://goo.gle/palm-paper - AFAIK from the Blogpost, Scaling laws still hold up (i.e not yet plateaued) - New transfer learning capabilities, outperforms fine-tuned models with 50x less data (Codex-12B) - The interesting part is how it meta-learns techy geeky jokes and is able to correlate concepts, and explain jokes suggesting starting doing a bit more meta-learning than GPT3 ever could.... But still not enough to generate decent ones (though the joke wasn't particularly humorous, so I may be underestimating) SoTA on various tasks, chain-of-thought-reasoning still holds up to scaling and outperforms some reasoning benchmarks, BIG-bench sees a huge improvement and general LLM thingys :) submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 4 min )
    [R] Minimum Description Length Recurrent Neural Networks
    https://arxiv.org/abs/2111.00600 https://preview.redd.it/l6dni0007jr81.png?width=4888&format=png&auto=webp&s=82c7c9b9433b79c66318090ff85e4535c35ddb18 submitted by /u/inland-1 [link] [comments]  ( 1 min )
    [R] CfP EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
  • Open

    Logging in Python
    Logging is a way to store information about your script and track events that occur. When writing any complex script […] The post Logging in Python appeared first on Machine Learning Mastery.  ( 22 min )
  • Open

    Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow
    As more organizations move to machine learning (ML) to drive deeper insights, two key stumbling blocks they run into are labeling and lifecycle management. Labeling is the identification of data and adding labels to provide context so an ML model can learn from it. Labels might indicate a phrase in an audio file, a car […]  ( 7 min )
    Enable Amazon Kendra search for a scanned or image-based text document
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra supports a variety of document formats, […]  ( 5 min )
    Interpret caller input using grammar slot types in Amazon Lex
    Customer service calls require customer agents to have the customer’s account information to process the caller’s request. For example, to provide a status on an insurance claim, the support agent needs policy holder information such as the policy ID and claim number. Such information is often collected in the interactive voice response (IVR) flow at […]  ( 6 min )
  • Open

    School of Engineering welcomes Thomas Tull as visiting innovation scholar
    Primary focus will be to advance and promote technology, innovation, and entrepreneurship across the school.  ( 4 min )
  • Open

    Microsoft Researchers Introduce ‘Jigsaw’: An AI Tool To Augment Large Language Models (GPT-3, Codex, etc.) By Deploying Post-Processing Techniques That Understand The Programs’ Syntax And Semantics
    GPT-3, Codex, and other sizable pre-trained language models can be adjusted to create code from natural language descriptions of programmer intent. Every developer in the world might benefit from these automated models, which have the potential to increase productivity. However, because the models may fail to understand program semantics, the quality of the generated code cannot be guaranteed. Microsoft researchers introduce Jigsaw, a new tool that can help these big language models perform better. Jigsaw is a Python Pandas API code generator that accepts multi-modal inputs. Jigsaw uses post-processing techniques to decipher the syntax and semantics of programs and then uses user feedback to improve future performance. Continue Reading Paper: https://arxiv.org/pdf/2112.02969.pdf Dataset: https://github.com/microsoft/JigsawDataset ​ https://i.redd.it/x223r5qu0kr81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    submitted by /u/nick7566 [link] [comments]
    Generative AI+Alex Grey = xxxxxoooooooo (Disco Diffusion)
    submitted by /u/JoshGrambo [link] [comments]
    UiPath extract Tables from PDF (use case) (PDF table)
    submitted by /u/Cristi_UiPath [link] [comments]
    New RL technique achieves superior performance in control tasks
    submitted by /u/bendee983 [link] [comments]
  • Open

    Data Augmented Healthcare Part 2
    In the previous part, we discussed the current state of data imaging tools in healthcare and the future applications of these technologies. While increased access to information is invaluable to physicians, they can still be limited by their own ability to interpret, or the physical limitations of their surgical ability. In addition to augmenting the… Read More »Data Augmented Healthcare Part 2 The post Data Augmented Healthcare Part 2 appeared first on Data Science Central.  ( 3 min )
  • Open

    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of …  ( 9 min )
  • Open

    Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse
    Pekka Varis’s artistry has come a long way from his early days as a self-styled “punk activist” who spray painted during the “old school days of hip hop in Finland.” The post Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Composing Music with Neural Networks
    Hey guys, ​ I really love creating music algorithmically, which is why I have dedicated my master’s thesis to the generation of music patterns by the use of artificial intelligence. In the course of the past 12 months, I have programmed a deep recurrent neural network in Python, which I have trained on 200 self-made music patterns in order to generate somehow novel motifs. ​ In order to evaluate my model, I have set up a short online listening experiment. I’m looking for test subjects right now, so if you are interested in participating, I would really appreciate it. The listening experiment will take you just about 5 to 8 minutes to complete and the only thing you need is a pair of headphones. You can partake on your computer as well as on your smartphone or tablet. ​ Here is the link which gets you to the listening experiment: https://forms.gle/rx1FUQ7RgpjMu1xx9 ​ Thank you very much for taking the time to help me reach my goal. Really appreciate it. submitted by /u/JosephdeLaquinta [link] [comments]  ( 1 min )
2022-05-04T00:59:20.296Z osmosfeed 1.14.4